Tobias Alexander Franke

A read-it-later service for RSS purists

Leave no stylesheet alive

I read a lot. I read a lot of papers for instance. They are easy to read, because they have a common format, by which I mean their appearance. This is achieved in the academic community largely by separating markup, the language that describes what - in an article - constitues a headline, a paragraph and so on, and the formatting of thosed marked up properties, through a common Latex template such as the SIGGRAPH acmtog or using common formatters for simpler markup languages such as Markdown. This has one tremendous benefit: Reading a lot of articles is easy, because they all look the same.

I have rambled about the terrible state of the web before, twice actually. Most webpages have become unreadable, which describes a state where it is hard or impossible to find the actual content of a webpage amongst ads, banners, popups or other annoyances. This is especially disappointing if one opened said webpage with the expecation to just read some piece of text matching a headline.

It is ironic that HTML - being the markup language to differntiate content from style - has been abused by designers so much that a bunch of tools emerged to extract the content and throw away the style. Over a decade ago, when this problem became apparent, it was addressed with a bookmarklet called Readability reformatting webpages to reduce clutter, followed by browser implementations which added so called reader views, for instance in Firefox and Brave.

Now Read-it-later services solve two problems at once and make articles offline-readable: Instapaper, Pocket and the FOSS self-hostable Wallabag download and store bookmarked articles after running them through a readability tool, so that eventually all articles have the same layout and same formatting.

Motivation

As someone who reads a lot for research (papers, blog posts, news articles etc.) I wanted to have the simplest possible version of such a read-it-later service: All my bookmarked articles look the same and are readable offline .

Failure 1

My first experiment was to manually store PDFs of the reader view in Firefox. That’s a lot of manual labor though, and syncing across devices requires to store all PDFs on some cloud service. It’s also not easy to use on the go.

Failure 2

My second experiment was trying out Archivebox, a neat FOSS project that streamlines archiving. Archivebox will not just download a copy of a webpage, but run it through several polishing tools. One of them is a readability library, which stores a PDF and HTML version alongside the original article. However, I found Archivebox hard to deploy on a webhost with just PHP on it (basically the things you get for free) and it produces quite bit of extra data I don’t need, but otherwise ticks almost all the boxes.

Failure 3

My third experiment ended in a personal frustration of mine: My workflow for reading anything I am interested in is by adding a star in my RSS reader to an article, which necessitates that anything I want to read is somehow a subscription or feed I can add to my RSS reader, and that isn’t true for individual articles I stumble across. Sometimes this isn’t even true for articles on blogs, where the blog is either too huge to subscribe to for just this one article, or (a sad occurunce these days) it doesn’t have RSS at all, even though that is trivial to add.

Before switching to the most excellent NetNewsWire reader, I was fond of Reeder, which includes the functionality to add read-it-later services like Instapaper as accounts, showing the same interface for both RSS and read-it-later subscriptions. This, essentially, made Reeder my one-stop reading application, indifferent to whether the article came from a single URL I wanted to read or from any of my RSS feeds. Whenever there was an article I wanted to read, I’d simply push it to an Instapaper or Pocket account and then fetch it via Reeder. However, this had one major downside: I’d use a centralized service that gradually became a huge data dump, as more and more articles ended up in those accounts.

Failure 4

My fourth experiment involved RSS itself. NetNewsWire, like most RSS readers, is a minimalist, pure RSS reader, which means anything you want to star inside the application has to come from an RSS feed you can subscribe to.

Both Pocket and Wallabag allow you to subscribe to your personal collection of articles via RSS, essentially treating your article collection as some kind of blog with new posts appearing whenever you add an article. However, this approach had the same downside as when I was using Reeder: The read-it-later accounts become a huge dump after a while, because I was not really interessted in either Pocket or Wallabag, only in redirecting saved articles to my RSS reader. This solution however was extremly close to optimal, with just the exception of me logging into my accounts in regular intervals to delete everything.

Requirements

From the four failures I learned that what I wanted was actually this:

Send a URL to a script, which runs it through readability and attaches the resulting content directly to an RSS feed. Ideally, I can create different feeds for different collections, for example one for work and for personal interests.

Specifically, I want to:

  • Store single articles in a RSS reader application
  • Avoid third-party read-it-later services such as Pocket, Instapaper or Wallabag
  • Minimize the amount of necessary apps for reading articles
  • Get rid of accounts and not sign up to anything
  • Read articles (offline) in a readable format, but not categorize or store them indefinitely
  • Synchronize stored articles to multiple devices
  • Optionally be able to self-host the whole architecture

The outcome of this is a project I call RSS-Librarian, a read-it-later service for RSS purists. RSS-Librarian solves all of these bullet points with a single, self-hostable PHP file that extracts content from URLs using a readability service and adds what remains as new entries into a personal RSS feed, without requiring special libraries, a database or user accounts.

How it works

You can drop librarian.php onto any host that supports PHP with no other requirements. RSS-Librarian has two parameters: librarian.php?id=HASH&url=SOMEPAGE.

  • id is a random ID for a personal feed. If this parameter is not supplied, RSS-Librarian will generate a new one and add a feed file corresponding to id into the subfolder feeds/.
  • url is a URL you submit to RSS-Librarian, whose content will be extracted and added to the feed coressponding to id.

For each url posted to RSS-Librarian, the extracted content will be added to a RSS file derived from id - if it exists in the feeds/ folder. If it does not exist, it will be generated and written. The RSS file will store a maximum of 100 entries before removing the oldest one and adding the new one.

An example use-case

I will be using the this demo instance of RSS-Librarian in combination with NetNewsWire on MacOS (MacOS App).

I can recommend NetNewsWire on iOS (App Store Link) FeedMe on an Android (Play Store Link, APK) and FeedBro for various Browsers (Firefox Addon, Brave/Chromium Addon). They all download offline copies of RSS feeds to your device ready for access even when disconnected completely.

Step 1: Create your personal feed

Assume we have the following article we want to store for later reading. First go to the RSS-Librarian instance. You’ll be greeted by the follwing interface.

The view of RSS-Librarian on first visit The view of RSS-Librarian on first visit

This instance is currently hosting 8 other feeds. Paste the article URL https://ohshitgit.com/ into the empty field and press Add to feed. Because you are a new user, RSS-Librarian will now generate a random, unique user ID for you.

RSS-Librarian after adding the first link RSS-Librarian after adding the first link

On the next page you will see two important things:

  1. Your personal URL: Store this URL in your bookmarks! This allows you to add more articles to your personal RSS feed.
  2. Your personal RSS feed: This is your feed that you can subscribe to with your RSS reader application. It is unique and can only be managed with the personal URL above.

It is important - after adding your first link - to bookmark your personal URL somewhere so you can keep adding links to your feed (instead of creating a new one accidentally)!

Step 2: Open your personal URL

For this demo, the personal URL now links to http://alternator.hstn.me/librarian.php?id=ff4d8b605c2cffacd19639af2a1d3ff8712021d66431de86f425e0279a1e768a. RSS-Librarian, on first use, will create a random hash (in our case ff4d...768a). This hash will be used to store new URLs to your personal RSS feed, which is derived from that hash. You can think of this as your user ID.

Open the personal URL with a browser again.

RSS-Librarian personal URL RSS-Librarian personal URL

With this page you can add more articles to your personal RSS feed. Let’s add another article to the feed we just created.

Adding another article to the personal RSS feed Adding another article to the personal RSS feed

You can now point your RSS reader to your personal RSS feed and check out the result. At the bottom of the page you can find additional tools:

  • Feed bookmarklet: A bookmarklet you can use instead of the personal URL. This will add the currently open page to your personal RSS feed.
  • Feed preview: If you have no RSS viewer at hand you can preview your personal RSS feed with this page.

Step 3: Add it to an RSS reader of your choice

Let’s try this out with NetNewsWire on MacOS. Copy the link personal RSS feed, press the + symbol in NetNewsWire and add the URL.

Adding the personal RSS feed to NetNewsWire Adding the personal RSS feed to NetNewsWire

After adding the feed, your stored articles appear in your RSS reader. Most RSS readers will automatically download all articles in a feed, meaning you also have a stored copy offline on the go, for instance when reading on a plane without WiFi.

A view of the stored articles A view of the stored articles

If you click the title of the article, you will end up on the original URL (in the above sample our first article). If you open the feed itself - RSS-Librarian (ff4d) - you will jump back to your personal URL where you can add more articles to the feed.

If you do not have an RSS reader around, you can also click on Feed preview in the tools section of your personal URL to get a quick web-based view of the feed.

Viewing a feed preview using feedreader.xyz Viewing a feed preview using feedreader.xyz

And that is pretty much it. By visiting your personal URL on a RSS-Librarian instance, you can add more articles to your personal RSS feed and they will become easy to read and storable offline by your reader software. You can subscribe to your personal RSS feed with as many readers as you want and even share that feed with others (which however will let them add articles to it too).

Deploying your own instance

Simply get a copy of librarian.php from the Github repository and put it on any VPS, Raspberry Pi or other webserver that has PHP. There are no other requirements. Create a directory called feeds right next to the file and give the webserver write access to that folder.

In librarian.php you can configure some globals:

  • $g_max_items is the number of articles stored per feed. Adding a new article beyond that number will remove the last one in the feed. Reduce this number if you’re tight on space.
  • $g_dir_feeds is the directory where user feeds are stored. Rename this if you want to use a different directory on your instance.

Future work

There are a bunch of things I want to implement.

Protected shareable feeds

Currently the id given to RSS-Librarian is used to create a feed in the feeds/ directory that has the same name. This means that sharing this feed with others will allow them to access the personal URL as well and add articles. I’d like to change this behavior so that the feed’s file name is generated by hashing the id again. This way, the user id and the feed id are separated but correlated in one direction: The id can be used to generate the corresponding feed file name, but not vice versa. This comes at the cost of usability: The feed currently links back to it’s personal URL, which means you can quickly jump to the place where you can add more articles. This would need to be removed to not give away that page.

Remove FiveFilters dependency

RSS Librarian sends articles to FiveFilters, which extracts them into single RSS entries and then appends that entry to the locally hosted feed. This is not optimal, because the feed generation is essentially centralized and not independent. Instead, RSS-Librarian needs to do the extraction locally via the FiveFilters Readbility library for PHP.

Maintenance

Every user can add a new feed at any time and as many as they want. An instance can quickly create a huge dump inside the feeds folder full of garbage feeds that are abandoned because users forgot to bookmark their personal URL, or because they lost the link somehow later.

On every new article added RSS-Librarian could run through the feeds/ folder and simply delete feeds that are too old, for instance those that had no new items added to them for more than 6 months. This is not an issue if the id gets re-used again: RSS-Librarian recreates files for ids when another article URL is added.

Share-feature for mobile

App based read-it-later services tie into the OS and let you share an article with the app, which is the default way to add an article quickly to the read-it-later list. RSS-Librarian however has no app and uses a webpage instead, which makes quickly adding an article to a personal RSS feed tedious.

Currently, one can get to the personal URL by simply opening the personal RSS feed URL. Ideally however I’d like a simple share button that adds some URL being shared from any app.

Main instance

An RSS-Librarian demo instance for testing is currently hosted here. The instance has a self-signed certificate and therefore many RSS readers will run into issues. Using HTTP instead is a no-go though. I want to eventually host a reliable main instance.

The Great Crusade against Web-Enshitification

I’ve been using RSS-Librarian just for myself for about half a year now and it’s been a delight so far, given how compact the code is. I created two feeds for myself: One for articles I’m interested in personally, and one for stuff I need for work. I subscribe to my work feed using my personal devices and with FeedBro on my work machine, which means I can easily add and read articles I find for work everywhere, whereas the feed containing articles I’m just interested in personally is only in my feed readers on my own devices. It’s all neatly separated and easy to manage.

Of course I wrote this for myself initially, so if you find it weird, confusing, chaotic, or you’d like to see a feature, please open an issue or let me know otherwise.

2024-04-28
RSS Notes

Redirect everything

The web is terrible: Redux The screams are getting louder, but the solutions keep getting dumber. I know that my background as a graphics programmer is adding heavy bias to my point of view, but I really do not send data-oriented design lectures from Mike Acton to random Electron-app developers...

Continue reading

2024-04-13
Commentary RSS Notes

Engine Design as a Holistic Adventure

Most listeners know that Huawei isn’t just a mobile phone company. But many might not realize that it runs a global network of research institutes that are working on optimizing engine development. Tobias Alexander Franke, principal game engine architect at Huawei, talks with Royal about why the company prefers working...

Continue reading on external page

2022-10-21
O3DE Architecture Open Source

Interview with the Irish Times

I have been interviewed by the Irish Times on the topic of the Metaverse, what it is, why one would be interested in it, the big problem it is supposed to solve and how it will affect our lives. In this interview, I give a sober analysis of the Metaverse,...

Continue reading on external page

2022-07-21
Metaverse Crytpocurrency

A New Gaming Eco-system for Huawei

Cloud-based gaming services have seen varying success in practice. Where-as multiplayer games feature a wide variety of aspects which are hosted remotely, most games today run exclusively on one device and require the customer to adapt to new hardware requirements in regular cycles, usually tied to the current console generation....

Continue reading

2022-07-07
Conference Publication