I recently moved most of my websites over to Netlify, because, well - I work there, and I want to be dogfooding as much of our product as possible. As part of this, I enabled my sites to use Netlify Analytics, which has been a fantastic lens to look at the site usage from a server-side perspective.
As I explored the view a bit more, I realized that I wanted to store the data locally for better long-term analysis. After all, I’ve done that before with Twitter analytics. How could I do that here?
Well, I am writing this on vacation and my work computer is off - I thought I shouldn’t bug any of my coworkers and do some exploration myself. One of my colleagues chimed in in the support forums earlier this year about an undocumented API, so why don’t I try leveraging that?
A word of caution here - this is not something that is officially supported. That is, if something changes or breaks, there is no support for that problem, as the API is not officially exposed through the Netlify API surface. Don't put this into any production workloads.
There are a few endpoints that are available through the dashboard, that map to each “panel” that is visible to the user. These endpoints are listed below.
Volume of page views for the site.
Volume of unique visitors to the site.
Countries from which visitors come to the site.
Site traffic sources.
Most frequently visited pages on the site.
Amount of bandwidth consumed.
Not Found Pages
Top pages that were not found on the site but were requested.
Getting The Data
Great - now I know the endpoints, and am able to query the data for local storage and processing. The neat part about the above too is that I can set custom time ranges and limits, so I can pull whatever amount of data I want rather than be constrained by the most common options.
Each request requires a bearer token that is passed in the
Authorization header. Assuming that I am automating the process, I don’t want to be constantly looking at the network inspector to get the token. So, I thought I’d automate that piece of the process two. The login flow (without SAML) follows the pattern below.
First, an authentication
POST request is issued to this endpoint:
The body of the request is like this:
If the request is successful and you have two-factor authentication (2FA) enabled, you’ll get a redirect URL:
This is the link to the page where the 2FA token can be entered manually, but because I am automating the process, I need to have a way to not do this. Instead, I noticed that in the browser there is another API request that is executed, against this endpoint:
POST request with the body containing the OTP code wrapped in a JSON envelope:
The request to this endpoint requires a bearer token too, but the nice thing is that the access token from the
redirect_uri property is, in fact, the bearer token that we need here. Include it in the Authorization header to the
verify_login call, with the correct OTP code in the body, and if all goes well - you will end up with another response:
That’s it - exchange complete! The access token you got here is the bearer token you can use for requests against the Analytics API. By wrapping the logic above in a Python script, I am now able to store the data in a local SQLite file and slice-and-dice it in a Jupyer notebook, but that’s a topic for another blog post in the future.