› Geoff Ruddock

👋   Hi, I’m Geoff.

I work at the intersection of data science and product. I help teams to build narratives around user behaviour at scale using quantitative data. These are some of my notes around work, personal projects, and general learning. I write primarily as a way of clarifying my own thinking, but I hope you’ll find some value in here as well!

A usecase for templating your SQL queries Suppose you have a table raw_events which contains events related to an email marketing campaign. You’d like to see the total number of each event type per day. This is a classic use-case for a pivot table, but let’s suppose you are using an SQL engine such as Redshift / Postgres which does not have a built-in pivot function. The quick-and-dirty solution here is to manually build the pivot table yourself, using a series of CASE WHEN expressions.
One feature of Lightroom I have not made much use of is the Maps view. While all my smartphone photos are automatically geotagged, I have historically neglected adding geotag info to the 80% of my photos shot on my dedicated camera, which does not have GPS. As a COVID lockdown project, I decided to try to use the location tracking data from Google Timeline to geo-tag my photos “automatically”.
As a subscriber to the Farnam Street newsletter, I enjoy reading Shane’s articles about using various mental models from other disciplines to improve our decision-making. Reading about these mental models is fun, but I am cognizant of the fact that reading about them did not equate to learning them. I have been using Anki flashcards in language-learning and technical contexts for a few years now. So the question is: what is the best way to use Anki to facilitate learning mental models?
Despite being a relatively modern phone, my OnePlus 6T records video using the H.264 codec rather than the newer H.265 HEVC codec. A minute of 1080p video takes up ~150MB of storage, and double that for 60fps mode or 4K. Even though the phone has a decent amount of storage (64GB) it quickly fills up if you record a lot of video. The storage savings from HEVC are pretty astounding. It typically requires 50% less bitrate (and hence storage space) to achieve the same level of quality as H.264.
A few weeks ago while learning about Naive Bayes, I wrote a post about implementing Naive Bayes from scratch with Python. The exercise proved quite helpful for building intuition around the algorithm. So this is a post in the same spirit on the topic of AdaBoost.
While learning about Naive Bayes classifiers, I decided to implement the algorithm from scratch to help solidify my understanding of the math. So the goal of this notebook is to implement a simplified and easily interpretable version of the sklearn.naive_bayes.MultinomialNB estimator which produces identical results on a sample dataset.
This blog runs on Hugo, a publishing framework which processes markdown text files into static web assets which can be conveniently hosted on a server without a database. It is great for a number of reasons (speed, simplicity) but one area where I find it lacking is in support for math typesetting.
I recently finished reading Scott Page’s wonderful book The Model Thinker. This book does a great job of spotlighting some more niche and technical models from the social sciences and explaining them in an ELI5 manner. He touches on 50 models in the book, but here is a quick summary of the big ideas that jumped out to me.
There are a number of good free sources for market data such as Yahoo Finance or Google Finance. It is easy to pull this data into python using something like the yfinance package. But these sources generally only contain data for currently listed stocks.
I love jupyter notebooks. As a data scientist, notebooks are probably the fundamental tool in my daily worflow. They fulfill multiple roles: documenting what I have tried in a lab notebook for the benefit of my future self, and also serving as a self-contained format for the final version of an analysis, which can be committed to our team git repo and then discovered or reproduced later by other members of the team.
« Older posts Newer posts »