Geoff Ruddock

Save entire webpages for reference With SingleFile

I’ve been reading through a lot of Tiago Forte’s writing on his members-only publication Praxis. Since reading through his series on progressive summarization, I have become more concientious with regards to saving the “work-in-progress” artifacts of my thinking process to Evernote. Often this involves a link to a piece of content, a couple highlights, and a bullet point or two about key takeaways.

The problem

It’s pretty easy to surface relevant notes using the Search function if I’ve added enough contextual info to the note, but less so if it’s just a link. So I wanted to start saving the actual raw content of key articles, particularly if they come from a members-only publication to which I may not have permanent access, and I cannot surface on Google.

I initially tried using the Evernote web clipper to save entire articles, but quickly realized that this was cluttering up the namespace of my evernote search. A few 10k word articles add up quickly, and soon they dwarfed the amount of content in my otherwise relatively text-sparse notes. Searching simple one or two-word phrases related to everyday notes (e.g. shopping, home tech, etc.) would return a result set cluttered with barely relevant saved articles.

Criteria

A suitable solution satisfied the following criteria, in decreasing order of importance:

Searchable – Can I perform a global search for text contained within pages without knowing which page to open?

Portable – Can I search and open the file on different devices using some static file, or must I launch some command line tool on my laptop before opening some proprietary format or web UI for a localhost database?

Readable – While true as-web formatting would be ideal, I would settle for being able to read the primary content (text) start-to-end. Lack of CSS and javascript is sometimes not just ugly, but makes the content unreadable.

Solution: SingleFile

I recently came across a neat Chrome extension called SingleFile which saves webpages as HTML files, but first waits for lazy-loading javascript, images and CSS to render. It doesn’t work perfectly—it sometimes includes the blurry version of lazy-loaded photos unless you first scroll to the end of the page—but it works lightyears better than anything else I’ve tried.

If you store your HTML files in a folder indexed by Alfred, you can instantly surface them using the in keyword.

Other things I’ve tried (or considered)

Saving HTML locally via your browser

In Chrome you can achieve this with a Right-click → Save as → Complete webpage (.webm). The main problem is that it doesn’t include CSS and JavaScript not present at the initial pageload. Without this CSS, a lot of pages are impossible to decipher.

This is the best Chrome extension I’ve found up until now. It only outputs PDFs, but it lets you interactively remove superflous components (e.g. advertisements, banner images) before saving. This extension would work well for someone who either wants a print-ready format or likes PDFs (e.g. for highlighting in Mac Preview app). [chrome web store]

Web recorder

A tool called webrecorder.io came up in a few Hacker News threads. It seems to be a comprehensive roll-your-own alternative to something like Archive.org. It’s somewhat overkill for my purposes though, which largely amount to archiving articles for personal consumption.


comments powered by Disqus