I was working on some data analysis recently, that involved me dealing with data snapshots. These are effectively point-in-time representations of specific data. For example, if you’d be dealing with the amount of items at a warehouse, you could imagine that you’d be taking a snapshot through the day of how many items you have available during each hour. That will give you a pretty good idea of the in-flow and out-flow of materials.
A colleague and I were working on some data analysis tools today, and they encountered a puzzling error when processing a large chunk of data. Due to the volume of unique users that they were analyzing telemetry for, they decided to use sampling by hashing the user IDs and then taking a slice of the group they wanted to investigate. Because they were using Kusto, they could rely on hash() - a function that returns a hash based on the input value (it uses the xxhash algorithm behind the scenes).
One of the things that I am really curious about is analysis of publicly-available data. There is a lot of useful context that can shed a lot of light on some important happenings and trends. I’ve started with one of the resources that has a lot of rich, user-created content: Reddit. I also wanted to focus on a local implementation, that does not require me to sign up for a big data service, such as BigQuery.
I am all about numbers when it comes to driving decisions. That’s the most accurate and tested way in ensuring that you are pursuing the right thing. Not to say that we should not focus on things like customer development, but numbers certainly can shed a lot of light on whether the product is on the right track. There are certain approaches to analyzing product data that universally apply to all projects, and can yield some interesting insights given that you put some time in it.
I’ve recently had a chat with my team on metrics - one of the key topics a PM needs to be well-versed in. Metrics are at the core of helping you define what you’re truly after. Having a deep understanding of what you are measuring is the only way to make sure that in the long-run, you’re able to build a product that truly satisfies customer needs. Consider this post a basic intro in just that - getting a grasp on quantifiable measures of the performance of your product.
As a PM, one of the most important skills that I learned in the past couple of years is the ability to collect, query and analyze data. No, really - data is fascinating to me almost more than any other part of the product. Data can inform any future decisions and either validate or invalidate your hypotheses around the direction of your work. Before joining Microsoft, I always thought that working with data is something that only data scientists and analysts do - a PM sets out the path for the product, the data science team provides the numbers and insights, and then engineering drives the implementation.