As we released the Visual F# documentation as open-source, one thing stood out as a challenge that needed to be tackled – content validation. There could be several things we could do, such as integrating extra validation rules in the build system or building a GitHub bot. I thought that as a learning experience, I will go the bot route. This post explains how I worked this problem.
First of all, you need to understand that a GitHub bot is nothing ore than a web app that reacts to certain things that happen in a GitHub repository. Those “happenings” are called web hooks, and GitHub provides ample documentation on those.
Before I started working on building my own infrastructure, I started looking for an existing template – with a multitude of bots out there, surely someone already put together a framework for what I needed to do. And it so happens that a fellow Microsoft developer, Felix Riesenberg, created peer-review-bot.
peer-review-bot does most of the things that I wanted my bot to do. Once a pull request is created, it will post a message and tag the PR in a way that flags it for the team that it shouldn’t be merged yet. Unless the PR is tagged with an “exclusion tag”, that would determine that it’s exempt from requiring a review.
Great, so I have that “template” piece of code that I wanted. What’s next? Next, I created a separate GitHub repository where I put the existing source and configured it to use a new GitHub account – orcabot. Setting everything up in
config.js is extremely easy.
If you need more information on how to actually modify the configuration file, I recommend checking Felix’s own guide.
Let’s say you have it all configured. Now what? It’s time to deploy it to Azure. Thankfully, Azure supports deployment directly from GitHub. The benefit of this continuous deployment approach is that with every check-in, Azure will automatically kick-off the build and notify you of any failures. It also gives you access to an easy deployment rollback mechanism. But that’s a conversation for another topic.
In the Azure Portal, I created a new Web App, and set the deployment source to be the repo where I hosted the peer-review-bot forked and modified code.
In addition to having the code, I now have the code ready to be leveraged. In its original version, it exposes two versions of a target –
POST. In the case of orcabot, I am mostly interested in the
POST endpoint, that is triggered when something happens in a PR.
In GitHub, open your project settings and select the Webhooks & services segment and create a new webhook:
The payload URL will vary depending on where your bot is hosted. If all goes well, you will see a new message appear in your PRs, as well as a custom tag:
It all works, but there are two ways I wanted to extend my bot:
- Currently it will mark the pull request as ready to merge if 2 approvers post a magic key combination – LGTM (or Looks good to me). What it doesn’t yet check is who posted that combination. This is not exactly dangerous if there is no auto-merge enabled on two green-lighted reviews, but when that is the case, you want to make sure that only approved developers or reviewers can sign off on the submission.
- Because the repository revolves around content, I wanted to automate content rule checks (e.g. all images must be within the repo, all images must have ALT text).
With the above in mind, I started small. To create a list of approvers, I bootstrapped an
It’s nothing more than a newline-separated text file that contains the GitHub IDs of those individuals who are cleared to sign-off on submissions.
config.js, I am dynamically loading the file:
When the instructional comment is being built, I am making sure that the listed approvers are notified about an inbound PR by separating the values from the array and pre-pending them with an
Easy part is tackled, we now have approvers read and notified. But what about checking for validation only from the approved list?
src/bot.js, I am performing the same read operation to get the list:
In the same
bot.js file, there is a function called
checkForApprovalComments – it will scan the list of comments for the pre-defined “magic” letter sequence. Within its logic to test whether the comment was already posted, I am adding an extra condition – ensure that the user that posted the comment actually counts for the purpose of content validation:
And that’s it for checking the user!
Now, how about content rules? For that, I created a
rules.json file that will contain RegEx lookup strings and messages that will be displayed once matches are found.
bot.js, this file is loaded from the get-go:
A convenient helper function I built within the same file helps me go through the rules and match them against the existing content:
Make sure to also include that function in the module exports:
Once done, all you need to do is modify
route/pullrequest.js to validate content on simple PR actions:
We only need to perform checks on actual PR actions (e.g. sync, modify, open) and not secondary actions, such as comments.
And now the bot works!
It’s not yet perfect - I am working on extending the validation logic and adding labels when there are content warnings, as well as improving its performance, so use the code above at your own risk.
Have any thoughts? Let me know on Twitter!