Resurfacing RSS

Is RSS/Atom even the right term to use anymore? Apparently these legacy XML artifacts are also known as ‘feed files’, though this somewhat effaces the common pattern of feeds being generated dynamically by a web application. I would have gone with the more general term ‘feed’ all along if it were possible. In the early days of blogging, these things often were referred to as ‘newsfeeds’. I still think that makes far more direct and intuitive sense to present to readers than getting into acronymns and code words labelling specific formats and protocols.

Before I go too far down any path, I want to get a sense of where contemporary thinking is at on feeds and syndication and what alternatives are available.

Backstory

When this website first went online in 2004, the Atom standard had not been officially released and there had been a huge mailing list and blogging war going on for years about RSS 1.0 vs RSS 2.0. The web was smaller, slower and looser back then, more tight-knit socially. It was developed through an experimental public research culture governed by W3C and IETF and driven by individuals who were super-into these hyperspecific niches.

Why did I drop it from the site in the first place? I stopped writing constant journal entries. Started developing less frequent, more complex long-form essays and talk transcripts. I transferred the site from a database-driven CMS written in PHP to a plaintext-backed static site generator written in Ruby 2013–2014. Google had only just killed off Reader. Usage of RSS/Atom was declining around the world and nobody seemed particularly interested in it. I had plenty of other stuff to do, so never bothered to reimplement it on the new site to begin with then never got around to it later.

Current potential

I recently did a little survey and got some feedback on how people are currently using RSS/Atom (people still are) and it does seem like there’s a potential audience to grow there. When social platforms like Twitter and Facebook work well they work like nothing else in existence for delivering fresh and relevant updates, making everything else look cheugy and dated. But these platforms are complex and chaotic extractive beasts, ridden with human suffering, political ops and maniacal marketing campaigns. They cannot be relied on to deliver a consistent or controllable reading experience.

I think is one of the main drivers of the recent interest in newsletters where the content delivery happens as an explicit sender and reciever exchange with much more clearly defined social boundaries. It’s also a potential pitch for web feeds to make a comeback.

Formats and protocols

Atom is well designed in the sense that its content model still applies directly to websites today and its original purpose is still relevant. None of it is fundamentally out of date even if it feels that way aesthetically. It maps directly to the way the content entries are already structured so can be added to the site without major changes. Other more recent initiatives worth considering include JSON Feed and h-feed which is a variation of Atom as an HTML Microformat.

I think there are two things Atom does particularly well: Providing a well-documented specification to extend the format or reuse elements from Atom in other places, and defining standard ways to encode/escape HTML content embedded in feed documents. This thoroughness and attention to detail was in direct response to some of the problems that arose with RSS (anyone who used RSS in the early 2000s will remember all sorts of issues with character encoding, HTML entities and tag soup) and this led to a toxic plumbing debate which in turn, led to a lot of time being wasted and a total lack of attention being paid to high level questions around usability, marketing and shared community resources to spur popular adoption. Two-Bit History has a very good summary of this history and its place in the technology landscape.

The IndieWeb wiki has a list of criticisms of feed files for Atom and RSS. They advocate for the main site HTML being used for content feeds rather than creating seperate special format URLs for feeds. This does make a lot of sense but I’m not yet sure how widespread the support for this pattern actually is in feed readers.

The use of XML in both Atom and RSS is a product of that very particular time period. A specific format criticism is that XML is inefficient and bloated as a representation. This is technically true but I wonder how relevant it actually is in practice. Most websites ship vastly more wasted bytes via unoptimised high DPI images and not swapping between alternative sets of responsive images depending on viewport size or connection speed. It’s also debatable and not necessarily true that special purpose XML feeds consume more bandwidth than HTML encoded h-feed blocks.

In doing this research, I also came across a recent initiative by Matt Webb to document flaws of existing formats and improve the experience of subscribing to feeds for the current generation of websites:

My sense is that RSS is having a mini resurgence. People are getting wary of the social media platforms and their rapacious appetite for data. We’re getting fatigued from notifications; our inboxes are overflowing. And people are saying that maybe, just maybe, RSS can help. So I’m seeing RSS being discussed more in 2020 than I have done for years. There are signs of life in the ecosystem.

One of these recommendations turns out to be exactly what I’d always thought: we should change the main noun to Feed and the main verb to Subscribe. Another thing I discovered was that as part of this effort, there’s now a community-supported XSLT stylesheet ensuring browsers do something sensible with feed XML links. This might tip the scales towards Atom for me. It’s coherent, simple and easy to get started and is much more focused on the value readers can get than comparable efforts with JSON Feed or h-feed.

Update: I didn’t realise for a few days, but at the exact same time as I was researching and writing this, the Google Chrome product team posted about experimental support for web feeds in the browser itself. Which is really something.

Planning next steps

Time to figure out what’s needed to get web feeds onto the site without needing ongoing maintenance and attention. I’ve resisted doing this sort of planning at the higher level for the site thus far, mostly because I don’t have large contiguous blocks of time to work on it so the effort of structured planning doesn’t really pay off. But it is helpful to have small lower level tasks ready that I can work on where time and energy permits.

Thinking about it from the perspective of a double-diamond design process, I feel like I’ve already gotten through the first phase of R&D, diverging to look at different formats and protocols and doing research on what people are currently thinking about in this space, then quickly converging on a single outcome of launching a set of web feeds powered by Atom XML with usability enhancements via the Pretty Feed XSLT.

Not a perfect answer or ideal solution but it is a concrete and clear objective to work towards.

What’s needed now:

Develop an Atom Feed template with good structure and semantics (this is something I can probably get mostly right from copying the patterns currently in use on other people’s personal websites)
Introduce new content graph queries or use the site manifest to generate lists of entries with field mappings that can be plugged directly into the Atom Feed template (I already have an extension to the site generator to build Google Sitemap XML—this can be reused to support web feeds too)
Do a bit of additional information design and organising work around what would be useful to subscribers—we can go much further on this site than a generic ‘Latest Posts’ list and look at specialised feeds for individual notebooks and topic groupings or surface a unique feed architecture supporting specific content types (talks, essays, projects)

So the next diverge/converge cycle here is probably going to be around researching and sketching out a useful and usable information structure, then culling complexity and wrong turns and getting it all working in code.