5 Things Learned Generating API Documentation

Table of Contents

API documentation - something that often remains an after-thought for developers purely because writing it can be cumbersome, it requires working with a bunch of different tools (often very old), and maintaining it makes developers cringe just because that means they have to come up with good examples and descriptions, and let’s face it - most developers would rather focus on writing code. Now, if you are writing code just for yourself, API documentation might not be as important; however, once you start working with larger organizations and more serious APIs, the lack of quality API documentation becomes a BIG problem.

Working on API documentation for the past year, this post summarizes some of the top level learnings - it’s by no means comprehensive (let’s be real, a year working with API docs is not enough to know everything about them), so don’t take this to your boss tomorrow with a “Den said this is why we need to make our process different”. Nonetheless, I think this will be helpful for those that aim to build documentation workflow.

1. Developers will write documentation differently #

If you work with 50 different developers, you likely have 50 different ways in which code documentation is written. Now, when I say “writing documentation”, in this context I am referring to documentation written in the product code (aka code comments) - I have no significant experience working with non-autogenerated documentation, so this will likely not apply to you if that’s what you do. Going back to the statement I made in the heading of this section, different developers have different styles of writing code comments - some use indent, some reference images from external files, others try to include external files in comments because they use tool X for their process. You can enforce as many guidelines as possible, and you will still end up with a massively variable way in which things are written, especially in organizations as big as Microsoft.

Prepare yourself to deal with a bunch of non-standard comment styles, and make your tooling resilient enough to handle them.

2. Automation is a key that opens the chest of edge cases #

OMG WHO SHIPS A LIBRARY WITHOUT THE PROPER VERSION OF THE DEPENDENCY is a standard statement when you start auto-generating API documentation. This is not a problem when you write things by hand, because your keyboard does not care about what dependencies you used in code, but when you start putting automation processes in place, it all bubbles up to the surface pretty quickly. At docs.microsoft.com we use pristine environments when it comes to generating documentation - this means we rely on VSTS hosted agents, that are recycled every time the original auto-generation build completes. This allows us to quickly identify when a developer forgot that customers downloading their NuGet (or other) package does not have the same set of dependencies already installed on their machine, or when they used invalid XML closing tags in the generated IntelliSense files. API documentation generation processes in this case are an extra line of defense in the code deployment process.

No, really - API documentation generation can help you find edge cases in your test cases.

3. Automation is not always trivial #

We all want automation, and we want it yesterday, because everyone has better things to do than writing API documentation by hand. That said, automation is not always trivial, mainly for one reason - API documentation tools do not see the code as “here is how customers should use it”, but rather “here is what the compiler tells us this is”. API documentation generation tools are typically geared at addressing one key scenario - create documentation in a way that maps 1:1 to the code. This approach does not always map to the idiomatic views that teams hold about their own code.

To give you a good example here, consider fluent APIs - a way to chain API calls in a structured way. Standard API documentation tooling has no idea whatsoever what fluent is - it sees every single link in the chain as entity out of context of the larger set of actions. While this is a true representation from a code perspective, it’s not how developers or consumers see it.

When automating API documentation generation, you always need to account for customer intent in addition to mapping code to HTML pages.

4. Not all doc generation tools are created equal #

Our .NET documentation tool, mdoc, supports versioning out-of-the-box. This means we can document several APIs at once and differentiate them by the package or SDK version. No other tools in our stack do that today - we use JSDoc and TypeDoc - for both, versioning is not built-in, and we have to have DocFX structure the generated content in a way that ensures that our customers can pick different product versions in the UI. This just one of the many examples of how even the industry-standard tools can fall short of your (and your customers’) expectations, so be ready to put in extra work beyond what is handed to you as an artifact post-build.

5. Formal and standard models are going to save your life #

The aforementioned mdoc generates documentation in ECMAXML. If we take a look at JavaDoc, we will see it’s outputting things to HTML. Sphinx generates Python-related documentation in ReStructured Text (wow, I just had to put a link to SourceForge). If you work with teams that work across platforms, you will end up with a smorgasbord of different UIs and templates, because every tool likes to be special and generate documentation in its own way. Every time you send your customers to different documentation segments, they will have to re-learn things - your customers don’t care about what tools you used to generate documentation, but they do care about a consistent experience. In DocFX, we are moving to what we refer to as Schema-driven Document Processor, that allows us to move the proprietary formats upstream, away from customers’ eyes, and produce a single YAML structure that is representative of DocFX expectations, therefore allowing us to make the overall user experience very consistent (and trust me, it makes the frontend devs’ life so much easier).

Bonus: Don’t generate from source if you can #

When your customers download your SDK or package, where do you think they are downloading that from? You will rarely find someone who goes to your GitHub repo, compiles the SDK, and then will integrate that in their code (if you are one of those people, I am willing to get you a coffee and chat more on the phone or in person to understand why). Most people download a package off of npm, NuGet or Maven, and will be on their way. So what is the problem with documenting from source, you ask? Simple - the source is in-flow, and the package is not. When you generate API docs snapped to a version of a package, you get documentation generated for what customers will see. That is often not the case with source code.

And with documentation tools being able to ingest XML (and other) types of comments directly from packages makes your life that much simpler. Also makes versioning docs that much simpler. So don’t document source - document snapshots, that are more often than not, public distributions of your API in the form of packages.

I will likely revisit this post in a year, and put together a follow-up with just how much things changed since.