Skip to main content
  1. Writing/

On AWS Shutting Down Open Source Documentation

·1860 words

I am both surprised and not surprised - AWS announced (close to a month ago) the fact that they are retiring the AWS documentation on GitHub. Jeff Barr, the Chief Evangelist for AWS, writes:

After a prolonged period of experimentation we will archive most of the repos starting the week of June 5th, and will devote all of our resources to directly improving the AWS documentation and website.

Five years is a hell of an experimentation period, and I am kind of sad that AWS, with all of its resources, couldn’t crack the code for sustainable open-source documentation maintenance. Mainly because other companies, the one I work for included, have successfully done just that.

For some added flavor (that way you have the context on the vantage point I am speaking from), I’ve been the first product manager for the original docs.microsoft.com that Microsoft brought to life way back in 2016. Time flies - it’s been seven years since then. Back in that era, one of the things that we’ve decided to do was move the entire company away from a clunky on-premise proprietary content management system that relied on files being defined in XML (I wish I was kidding) to a standards-driven implementation that used Markdown as a backbone and worked on top of GitHub and Azure DevOps. The latter was and still is used heavily for documentation that requires a bit more control and co-existence with product source code. I am writing this post not to pat myself and the team that built Microsoft Docs on the back, but rather to provide encouragement to those who are considering building documentation in the open - it’s been done successfully at scale, so don’t take one large company pulling back as a sign of things being bad.

Long story longer, the Microsoft Docs team saw tremendous value in making the documentation accessible for contributions to anyone around the globe, and there is no better way to do that than by going where developers already are. Guess what your open source contributors are using when writing their libraries, tools, and services? They’re very likely on GitHub. They write README files in Markdown. They maintain their wikis in the repository - entirely in Markdown. And they use their Git-based clients to make changes, open pull requests, and even manage issues. We’re talking about a rainforest of all sorts of well-integrated and extensible tools that documentation would fit right in with. I mean, hey, I talked about docs being part of the product with all the bells and whistles of open source integrations way back in 2018.

Hold on a second - if things are so great in the world of open source documentation, what went wrong with AWS docs?

I believe the primary factor in the demise of open source documentation at AWS is the human one. That is - the organization simply could not build a good collaboration culture internally within specific teams responsible for documentation (this requires collaboration between doc writers and engineers/PMs), as well as a rhythm with the community. The technological aspects of the docs are mostly solved across the board. I don’t even think it’s a problem with GitHub per-se, since Amazon has their own Git-based offering.

Back to Jeff:

The primary source for most of the AWS documentation is on internal systems that we had to manually sync with the GitHub repos. Despite the best efforts of our documentation team, keeping the public repos in sync with our internal ones has proven to be very difficult and time consuming, with several manual steps and some parallel editing.

First of all, the moment you read “manually sync” after five years of maintaining the documentation infrastructure, that’s a big, big, giant red flag. This is the kind of functionality that should be prioritized from the early stages. When we designed docs.microsoft.com one of the requirements coming from all kinds of teams across the company was “We want to have a private place for docs too.” Which, if you think about it, makes a ton of sense - you might be working on some new feature or a new release that you don’t want to disclose to the public just yet. If you do it all in the open, there are folks who will trawl your branches and look for potential news to uncover. That’s why we’ve built tools that helped teams keep public and private documentation in sync. Without naming any names, I can tell you that many large teams do just that, with fully automated synchronization and conflict resolution between Git repositories. There might be some work involved in potentially conflicting changes, but it’s not an end-to-end manual process.

With 262 separate repos and thousands of feature launches every year, the overhead was very high and actually consumed precious time that could have been put to use in ways that more directly improved the quality of the documentation.

Well, hm…

The MicrosoftDocs organization on GitHub has 1.4K+ repositories, split across many divisions and teams. Each team owns their docs, down to repositories or even parts of repositories. But also, the big differentiator here is that most teams publish documentation entirely in the open. That is, they do not sync changes between private and public - they just work in the open. To give you a “closer to home” example, my team is currently in the process of migrating our documentation for the Microsoft Authentication Library from GitHub wikis and standalone sites to Microsoft Learn. We’ve done that for .NET, Go, Java, and are working on Python as we speak. All in the open, ready for contributions.

Whoa, stop right there. So - it's all on GitHub? There is no internal system where you manage the content at all and you don't need to perform any synchronization between custom tools? No XML at all?

Exactly! When we designed docs.microsoft.com, we completely ditched the legacy system. Everything was migrated either to Markdown or, where needed, automated pipelines that produced reference API documentation for different languages and platforms. But the requirement remained - it will be handled through Git-based source control and onboarded to the central publishing system that orchestrates everything. The path to success with open source documentation is going all-in. When you keep two stacks, with two different sets of guidelines for each, a lot of problems and unnecessary complexity will start bubbling up.

Which leads me to my next point - the infrastructure should help in driving the right patterns. You publish the content from Markdown, and not from some internal CMS. You edit content through tools like VS Code, that integrate with a native extension that can validate the content structure. When creating pull requests, automated validation kicks in that provides insights within GitHub so that authors have one place to preview the content. Contributions need to happen within GitHub, which in turn should kick off content builds and generate new previews. API documentation is generated through CI jobs and placed also in content repositories, that use the same build system and tooling, as well as consistent configuration. You use the same tools that your community would use.

Again - infrastructure is helping drive the right patterns. There is minimal content switching.

So, this lands me to the following summary of hypotheses around what could cause open source documentation efforts to fail, at AWS or in any other situation.

  1. No proper resourcing allocated to building out a documentation ecosystem. Value of docs is very hard to explain even in the best of days when you need to spend capital on product R&D. Without proper resourcing, someone’s prototype system becomes crutch-supported infrastructure that fails to gain traction and support internally, leading to a product that just dies on the vine.
  2. No extensive training for technical writers on Git-based workflows. Despite what developers might think, using Git is not straightforward. There is a learning curve that requires training, help, and a lot of guidance and support. Without this the effort is going to be one massive uphill battle.
  3. No proper tooling. The manual syncing thing comes to mind - if you don’t build proper tools to help your teams be effective, they will not adopt whatever pattern or amalgamation you’re proposing to replace existing workflows. If they need to jump between many dramatically different tools, you might as well not bother going through the motions.
  4. No clear communication as to why this effort is important. Organizational inertia is a thing that any new product or initiative will have to go against. If you don’t communicate clearly what the return on investment is for making documentation Git-based and open source, good luck getting any sponsorship for the project. Said sponsorship is crucial in getting the project off the ground and keeping it above ground as you slog through the initial complexity and uncertainty.
  5. Not going all-in. You either step forward or you don’t. There is no point in trying to balance yourself across two sides of problem. Without providing a clear and consistent set of patterns and guidelines that remove friction and ambiguity the effort of moving to open source documentation is not going to pay off the dividends you might hope for.

Gee, that sounds like quite a bit of work to bootstrap and maintain a documentation project.

Indeed, especially if we’re talking about a company-wide project. Moving a large organization towards open source docs is like turning a massive container ship - it’s going to take a massive amount of effort and time.

The benefits of open source docs far outweigh the cost of setting the infrastructure and culture up, and your community will certainly appreciate the openness and the ability to contribute to the overall corpus of knowledge that can then help others in that same community use your products successfully. And every single contribution counts - from a typo to a better explanation paragraph that the product PM missed.

In theory, this sounds nice - but aren't you offloading documentation writing to the community instead of doing it yourself?

Not at all. Accepting community contributions is a vastly different type of effort than outsourcing the documentation writing to the community wholesale. For example, where I work we still have full-time writers whose sole responsibility is ensuring that our products have good docs. Engineers and product managers contribute to documentation, both in code and in docs-specific repositories. As product experts, we are the ones who want to set developers up on the path to being successful with our product and the way to do that is through documentation and samples - you can’t outsource this effort and get quality outcomes. Community contributions help enhance the set of docs that your team is already writing.

So, while I am disappointed that AWS decided to bail from this particular implementation of documentation systems, I would encourage those who are considering the option of bootstrapping and deploying their docs in the open, the same way engineers maintain code, to keep the effort rolling. You’re not alone - feel free to reach out if I can help brainstorm some ideas on how you can make your open source docs great.