June 17, 2019

Reflections on three years of spaced repetition with Anki

I was looking at my Anki deck stats the other day and realized that I have been using it for just over three years now. During that time I have added 20k cards and reviewed 140k. On average I spent 17 minutes each day to review 130 cards. Since this amounts to over 300 hours of my life at this point, I figured it would be worth reflecting on this habit and deciding whether it is a worthwhile investment of time going forward. Read more

May 13, 2019

Embed markdown documentation directly into your Airflow DAGs

Why you should do it I recently discovered that Apache Airflow allows you to embed markdown documentation directly into the Web UI. This is very neat feature, because it enables you locate your documentation as close as possible to the thing itself, rather than hiding it away in some google doc or confluence wiki. This, in turn, increases the chance it is actually read, rather than being promptly forgotten about and undiscovered by new team members. Read more

April 15, 2019

Save entire webpages for reference With SingleFile

I’ve been reading through a lot of Tiago Forte’s writing on his members-only publication Praxis. Since reading through his series on progressive summarization, I have become more concientious with regards to saving the “work-in-progress” artifacts of my thinking process to Evernote. Often this involves a link to a piece of content, a couple highlights, and a bullet point or two about key takeaways. The problem It’s pretty easy to surface relevant notes using the Search function if I’ve added enough contextual info to the note, but less so if it’s just a link. Read more

March 11, 2019

Calculating the bearing (angle) between coordinates in Redshift

I fielded an interesting request recently from our PR team, who wanted to generate a creative representation of our data based on the direction and distance of trips booked on our platform. Distance a key attribute of interest for a travel business, so it is naturally easy to retrieve this data. However the direction of a trip is something that had not been previously analyzed, and so it was not available off-the-shelf in our data warehouse. Read more

February 11, 2019

The best way to manage dependencies between DAGs in Airflow

Airflow provides a few different sensors and operators which enable you to coordinate scheduling between different DAGs, including: ExternalTaskSensor TriggerDagRunOperator SubDagOperator Which one is the best to use? I have previously written about how to use ExternalTaskSensor in Airflow but have since realized that this is not always the best tool for the job. Depending on your specific decision criteria, one of the other approaches may be more suitable to your problem. Read more

© Geoff Ruddock 2019