👋 Hi, I’m Geoff.
I work at the intersection of data science and product. I help teams to build narratives around user behaviour at scale using quantitative data. These are some of my notes around work, personal projects, and general learning. I write primarily as a way of clarifying my own thinking, but I hope you’ll find some value in here as well!
After years of using SSDs, I had almost entirely forgotten how annoying the sound of actual, spinning hard disk platters is. That is, until I bought a NAS earlier this year to set up as a home media server. Lockdown projects, yay! Below are some reflections from my noise reduction journey.
My main Black Friday purchase this year was a Tado° system (thermostat + smart radiator valves), which I acquired with the goal in mind of regulating the temperature in my bedroom by:
Turning the heat down early enough to be consistently cold at night. Turning the heat up in the morning to make it easier to wake up. Ideally 1h before. The first part is easy, but the second part can’t quite be achieved, at least out-of-the-box.
A usecase for templating your SQL queries Suppose you have a table raw_events which contains events related to an email marketing campaign. You’d like to see the total number of each event type per day. This is a classic use-case for a pivot table, but let’s suppose you are using an SQL engine such as Redshift / Postgres which does not have a built-in pivot function.
The quick-and-dirty solution here is to manually build the pivot table yourself, using a series of CASE WHEN expressions.
One feature of Lightroom I have not made much use of is the Maps view. While all my smartphone photos are automatically geotagged, I have historically neglected adding geotag info to the 80% of my photos shot on my dedicated camera, which does not have GPS. As a COVID lockdown project, I decided to try to use the location tracking data from Google Timeline to geo-tag my photos “automatically”.
As a subscriber to the Farnam Street newsletter, I enjoy reading Shane’s articles about using various mental models from other disciplines to improve our decision-making. Reading about these mental models is fun, but I am cognizant of the fact that reading about them did not equate to learning them. I have been using Anki flashcards in language-learning and technical contexts for a few years now. So the question is: what is the best way to use Anki to facilitate learning mental models?
Despite being a relatively modern phone, my OnePlus 6T records video using the H.264 codec rather than the newer H.265 HEVC codec. A minute of 1080p video takes up ~150MB of storage, and double that for 60fps mode or 4K. Even though the phone has a decent amount of storage (64GB) it quickly fills up if you record a lot of video. The storage savings from HEVC are pretty astounding. It typically requires 50% less bitrate (and hence storage space) to achieve the same level of quality as H.264.
A few weeks ago while learning about Naive Bayes, I wrote a post about implementing Naive Bayes from scratch with Python. The exercise proved quite helpful for building intuition around the algorithm. So this is a post in the same spirit on the topic of AdaBoost.
While learning about Naive Bayes classifiers, I decided to implement the algorithm from scratch to help solidify my understanding of the math. So the goal of this notebook is to implement a simplified and easily interpretable version of the sklearn.naive_bayes.MultinomialNB estimator which produces identical results on a sample dataset.
This blog runs on Hugo, a publishing framework which processes markdown text files into static web assets which can be conveniently hosted on a server without a database. It is great for a number of reasons (speed, simplicity) but one area where I find it lacking is in support for math typesetting.
I recently finished reading Scott Page’s wonderful book The Model Thinker. This book does a great job of spotlighting some more niche and technical models from the social sciences and explaining them in an ELI5 manner. He touches on 50 models in the book, but here is a quick summary of the big ideas that jumped out to me.