Posts in data

Unlocking My Spotify Podcast Data

May 23, 2021

If you are a podcast owner, one of the things that can be a bit annoying is the multitude of different data points that are available for the show. Now, this is not the fact that there is too much data but rather that this data is scattered across different providers, with different systems, and different ways to manage it.

Get GitHub User Contributions With GitHub Skyline API

February 17, 2021

2021 turns out to be a good year for folks like myself, that love collecting their own personal metrics. Earlier, I chatted about collecting air quality data, Twitter data - and now, GitHub contribution data. In this post I will describe a simple approach to grabbing your own GitHub contribution statistics without having to jump through too many hoops.

Running Scheduled Data Collection with Synology and Docker

February 11, 2021

I've used my Synology NAS for some time now - about two years and counting, and it's been a great tool to backup information locally (e.g. from my phones or shared computers). Then, I got to thinking - it's pretty much a mini-computer. It has a quad-core 1.4Ghz CPU, a whopping 2GB RAM, and _plenty_ of storage. I can do more with it than just use it for occasional data dumps. That is - I could use it for frequent data dumps.

Building Your Own Twitter Analytics

April 13, 2020

TL;DR: Check the source code out on GitHub for the project. It’s a demonstration of how you can use simple components to build awesome tools. That’s right, you don’t need Kubernetes for this! Table of contents Introduction The basics Ingress Data store Rendering layer Building the tools SQLite database Ingress script Analysis notebook Conclusion Introduction I’m one of those people that needs data around the things that I do - there is just something fun about being able to quantify and analyze things.

Instead Of MAX Use LAST_VALUE For Time-Based Data

February 15, 2020

I was working on some data analysis recently, that involved me dealing with data snapshots. These are effectively point-in-time representations of specific data. For example, if you’d be dealing with the amount of items at a warehouse, you could imagine that you’d be taking a snapshot through the day of how many items you have available during each hour. That will give you a pretty good idea of the in-flow and out-flow of materials.

Watch Out For Modulo And Hashes

January 28, 2020

A colleague and I were working on some data analysis tools today, and they encountered a puzzling error when processing a large chunk of data. Due to the volume of unique users that they were analyzing telemetry for, they decided to use sampling by hashing the user IDs and then taking a slice of the group they wanted to investigate. Because they were using Kusto, they could rely on hash() - a function that returns a hash based on the input value (it uses the xxhash algorithm behind the scenes).

Product Managers And Data: Cohort Analysis

August 17, 2019

I am all about numbers when it comes to driving decisions. That’s the most accurate and tested way in ensuring that you are pursuing the right thing. Not to say that we should not focus on things like customer development, but numbers certainly can shed a lot of light on whether the product is on the right track. There are certain approaches to analyzing product data that universally apply to all projects, and can yield some interesting insights given that you put some time in it.

You Too Should Be Data-proficient

April 24, 2019

As a PM, one of the most important skills that I learned in the past couple of years is the ability to collect, query and analyze data. No, really - data is fascinating to me almost more than any other part of the product. Data can inform any future decisions and either validate or invalidate your hypotheses around the direction of your work. Before joining Microsoft, I always thought that working with data is something that only data scientists and analysts do - a PM sets out the path for the product, the data science team provides the numbers and insights, and then engineering drives the implementation.

Quantitative And Qualitative Metrics

February 24, 2019

This question probably came up for anyone that ever worked as a PM - when should I use quantitative metrics, and when should I rely on qualitative metrics? It’s an important aspect of ensuring that the right data is used for the right aspect of product planning. Let’s start by distilling the terms. Quantitative data is data that is heavily grounded in numbers - how many users, what percentage of them are happy, what is the ratio of converted vs.

Pulling Ubnt Stats Locally

December 28, 2017

I love looking at my Ubnt graphs - how much traffic goes where, to what clients, and many other interesting indicators. But I am also the kind of person that loves having raw access to the data, so I started digging - how can I pull those statistics locally? The setup I run is described here - I have a cloud key that manages the network, composed of a router acting as a switch, access point and the security gateway.

Get Ahead In A Wait List, Or How To Never Trust The Client

April 16, 2017

As a developer, it is always important to keep in mind one thing - never trust the client. Ever. The client is neither a completely secure entrypoint nor the source of truth moving upstream to the service. NOTE: This issue has already been addressed and the fix is live. Shout out to Kyle Rankin for being on top of things and responding to my email. So that brings us to January 8, 2017, when I discovered getfinal.

In Between Usage And Engagement

October 23, 2015

In a world where everyone is trying to collect and analyze data about their product, it is easy to get excited about numbers. After all, getting hard numbers about what you shipped is, in a way, validating or invalidating the provided value.