Cutting off older feed entries with delta updates

Syndication feed delta updates help reduce feed subscription bandwidth costs

Feed subscriptions consume a lot of bandwidth. It can be reduced with compression and HTTP caching. However, whenever one new item is added, every feed reader will download the complete feed anew. Feed delta updates solves this by extending on HTTP cache revalidation so a server would only return new entries after the last fetch.

In a delta update to a feed, all the redundant entries that a feed reader have already retrieved and cached locally are omitted from subsequent updates. Only new entries will be included. Along with other techniques such as HTTP cache revalidation and compression, delta updates can help reduce the bandwidth usage of a feed reader.

A quick history of feed deltas

Delta updates for syndication feeds has been around since 2004 when they were proposed under the name “RFC3229+feed” by Bob Wyman. Whyman worked at the time on the feed aggregator, and was probably very familiar with how inefficiencies feed delivery can be. He proposed taking the primarily binary focused delta encoding aspects of RFC 3229 HTTP: Delta encoding in HTTP, and re-purposing them for feed delivery.

RFC 3229 provided a mechanism to expand the standard for other uses through keywords, so Whyman suggested a mechanism for the “feed” keyword. His proposal gained some traction, and saw adoption in places such as the Windows RSS Platform and the publishing system Textpattern. However, the original proposal was a bit broken as it built what was believed to be a good standard: the RFC 3229. Unfortunately, RFC 3229 haven’t seen much adoption and it breaks HTTP standards and conventions to achieve quick-fixes for known issues in some HTTP implementations at the time of the draft. The result was a standard that diverged quite a bit from how every other bit of HTTP works, and a bunch of clients and publishers with various compatibility issues. The proposal died out with a lot of varying and incompatible implementations laying around.

Fast forward to today, and HTTP implementations are much more capable than they where back in 2004. Taking the original 2004-proposal and assuming that the HTTP implementations are now actually somewhat competent, delta updating feeds can now be made a reality.

How delta updates to feeds work

(Some familiarity with HTTP cache revalidation is assumed for this section.)

Feed deltas are now quite easy to implement and they work with caches, reverse-proxies, and the rest of our HTTP-aware infrastructure. A feed reader can announce support for delta updates by sending a A-IM: feed (short for ‘Accept-Instance-Manipulation’) request header plus at least either the If-Modified-Since or If-None-Match request headers. These later two headers either return the Last-Modified response header or the ETag header from the last time the feed was pulled. Feed readers can cheat somewhat with the If-Modified-Since header, and return the time when they last pulled the feed instead.

Feed publishers in turn announce support by including the A-IM and If-Modified-Since keywords in their Vary response header. This instructs intermediary caches on how to be delta update aware. The trick here is that Vary: If-Modified-Since will instruct caches to validate incoming requests against the exact request header and limit their own validation to exact matches of this header.

The server looks at the request from the client, and performs normal cache revalidation checks against the If-Modified-Since header. An empty body message with a 304 Not Modified response status should be returned as usual if no new entries are available. When new entries are available, the behavior changes from how a feed request is normally returned:

The server removes all entries that where published before the time provided by the client in the If-Modified-Since header, and only return the new entries that have been published since this time. In addition to new entries, entries that have been modified since the same time stamp should be returned as well as long as they’re still “news worthy” as determined by the publisher. An entry may no longer be news worthy after e.g. a week after its initial publication.

In addition to the Vary response header, the server should set the IM: feed response header and add im keyword to their Cache-Control header. I’ve not found any HTTP implementation that are aware of these headers. They’ll be ignored but are required by RFC 3229. Likewise, the 226 IM Used status code is required by RFC 3229, but using a regular 200 OK response will make no difference.

It’s important that the server correctly sets the Last-Modified response header for every response, including for 304, 200, and 226 responses.

I’ve explained this section using the If-Modified-Since request header, but you can substitute it for the If-None-Match header if you prefer working with arbitrary-strings-as-identifiers rather than specific request dates.

What are the data savings like with delta updates?

Data savings depend on how many entries are included in the full feed, how often new entries are published, and how often various clients pull for new entries.

Take a typical feed with headlines and short summaries of the latest 20 entries. The below table shows a snapshot of such a feed in two variations: One where all 20 entries are included, and another variant of the feed with just the most recent entry.

Size gzipped size
Full feed, 20 entries 66,5 kB 12,5 kB
Delta update, 1 entry 3,1 kB 1,3 kB

A delta update reduced the feed size by 95,3 % (or 90 % in a gzip compressed variant). For a news site that publishes one entry per hour 24/7, and a feed reader that updates once per hour, that accumulates to 1,6 MB per subscriber per day for a full feed versus 0,074 MB per day per subscriber for a one-entry delta update per hour.

Over a 30-day period the gzipped variant, would be 9 MB per subscriber for the full feed compared to just 0,936 MB for delta updates. The below table shows the monetary costs of this amount of data for a subscriber using a mobile data connection in three different markets around the world.

United States Canada Germany
Full feed, 20 entries per pull $0,55 USD $1,0 CAD €0,70 EUR
Delta update, 1 entry per pull $0,06 USD $0,1 CAD €0,07 EUR

For a frequently updated feed, the bandwidth cost and usage is reduced by ~90 % for both subscribers and publishers. For a feed that publishes the full content in its feed, the savings could be significantly higher.

Flip these tables on their head and multiply them by a few thousands to work out the bandwidth savings and costs to feed publishers. Delta feeds updates are clearly more efficient and scales better than pushing the full feed ith every update in terms of bandwidth.

Feed reader, library, and publisher support

Delta feed updates are supported in a variety of feed clients and publishers. The below list is up to date as of .

Feed reader clients

  • Feed Headlines (Linux desktop/macOS)
  • FeedHQ (web)
  • FeedNotifier (macOS/Windows)
  • gPodder (Linux desktop/Windows)
  • Internet Explorer (Windows)
  • Jarr (web)
  • Liferea (Linux desktop)
  • Microsoft Outlook (Windows)
  • Miniflux (web)
  • Newsbeuter (Linux console)
  • Newsblur (web)
  • Vienna (macOS)

Publishing systems and libraries

  • Ellislab ExpressionEngine
  • Textpattern
  • picoFeed
  • Universal Feed Parser (py-feedparser)
  • Windows RSS Platform
  • Windows.Web.Syndication
  • WordPress (requires a plugin)

The situation on Windows is a bit interesting: the Windows RSS Platform offers feed retrieval and parsing, have been available since Windows XP SP2, and yet very few (if any?) third-party feed readers have been built on it. In the Windows App Store you’ll find a few apps built on the Windows.Web.Syndication, the modern app replacement for RSS Platform, which all the good feed clients use. (Most of these are skinned versions of a modern app example code for a feed reader published on GitHub by Microsoft themselves). The badly reviewed feed readers in the Windows App Store are all quite bad with unique issues like not supporting Atom formatted feeds, can’t fetch feeds over HTTPS, aren’t network/battery aware, or has other serious technical issues. I presume developers simply don’t know about the infrastructure offered by Windows itself to handle all of this.

Podcasts are another interesting area. Many podcasts publish their entire catalog of old episodes in their feeds for that one new listener every week who want to listen to all the old shows. These feeds can become quite enormous as time goes by. Delta updates solves the bandwidth problem without dropping discovery of old episodes. gPodder is the only supporting client right now, but that really ought to change!

Feed deltas are obviously easier to implement for clients than for publishers. When I started looking in to feed delta updates, I quickly found that the implementations had diverged quite a bit from each other. A few quick patches here and there, and the implementations were all more aligned and the list of supported feed readers had doubled. Working with open source software can be quite fun sometimes!

I hope I got you interested enough in this technology to implement support for delta feed updates in your software and to provide delta updates to the feeds you publish! Feeds don’t have to suck [bandwidth]!