Fingerprinting feed reader users

Your unique subscription list can be used to deanoymize feed reader client users.

I’ve been thinking of how to more accurately count syndication feed (“RSS”) subscribers now that everyone has devices that roam across several networks every day, and every home are assigned 18 quintillion IPv6 addresses and devices periodically change their IP address. Historically, you’d just count up the number of unique IP address/User-Agent pairs at the end of the day and be done with it.

While pondering this question, I started wondering about how feed analytics services like FeedPress and the old Google FeedBurner service kept their subscriber counts accurate. I’ve more or less concluded that they simply can’t ever have had super-accurate counts and that they’re probably just doing the best they can.

However, I did realized that there is a way to uniquely fingerprint most feed reader users across networks/IP addresses with a varying degree of accuracy depending on how many feeds the client is updating. Assuming that you’re a large scale internet infrastructure service or feed server that handles a large number of feed requests — such as Cloudflare, Google Blogger, Medium, Squarespace, WordPress.com — all of which do serve a large number of feeds.

Everyone’s feed subscription list is as different as people’s personal preferences and interests are different. There will be some overlap but assuming you follow more than a dozen feeds; your subscription list will probably be unique. What is more, the order of your subscription list is even more unique. Some feed reader clients will sort feeds alphabetically, but as far as I can tell — most sort either by a custom user order or the order the user added a subscription.

Feed readers normally update feeds from first to last in an ordered list. In other words, the subscription list and order can be used as a unique fingerprint that could be used to uniquely identify the user and deanoymize you between different networks.

This is only possible when the user is subscribed to multiple feeds hosted by the same feed publisher (Blogger, Medium, WordPress.com), feed processor (FeedPress, Google FeedBurner), or a large internet infrastructure provider (Amazon AWS, CloudFlare).

A feed reader client silently updating the users’ subscriptions in the background could lead to deanonymoze of the user across networks. An internet user’s feed subscription list is a good source of personal information as people are more likely to be subscribed to publishers, categories, and topics they’re deeply interested in. This could include personal data such as health conditions and sexual preferences as well as interest and demographics.

Feed readers can mitigate this to some degree by distributing feed subscription updates on multiple network connections (when available), or by randomizing the scheduled list of subscription updates.

I’m not suggesting that any of this is actually happening. It’s a theoretical vector for deanonymization that is only possible because of the web’s growing centralization. A more practical method to fingerprint users like this would involve no more than relying on HTTP cache revalidation super cookies (ETags); which is supported by most feed readers. Heck, some feed reader clients even support persistent cookies.

There isn’t really anything everyday internet citizens can do to prevent this type of fingerprinting. You could technically try to avoid all software — not only feed reader clients — with a discernible unique usage profile.

Technology on its own can’t practically be used to prevent large scale tracking and profiling. If our society is going to continue valuing privacy, then there has to be laws to govern and protect it too. New privacy laws like the General Data Protection Regulation (GDPR) is needed to limit the type of deanonymization and profiling as described in this article. I’m personally in favor of stronger privacy regulations instead of letting every internet user fend for themselves. The ever growing complexities and capabilities of web technology makes it unfeasible that anyone should be expected to protect their own privacy against actors that can and will go to any lengths to gather just a little more data about you.