A technical and privacy review of Cloudflare Web Analytics

Last week, Cloudflare announced its new Web Analytics service: a cost-free stand-alone product for counting website visitors. In its announcement, Cloudflare promised that its new service puts privacy first, doesn’t collect any personal information, and doesn’t use persistent tracking such as cookies.

Cloudflare Web Analytics counts site visits, page views, page-load performance metrics, link-referrer data, the name of the browser and operating system, and what country the visitor is in. It hopes to compete with the likes of Google Analytics and StatCounter. Like the market leader, Google Analytics, Cloudflare offers its service cost-free without any website traffic volume restrictions. Most other competitors charge a fee based on the number of monthly page views.

The service is, however, limited to only showing data for the last seven days. It’s unclear whether older data is retained and will be available later, or if it’s deleted in the spirit of data-minification. I’m all for auto-deleting old data, but doing so after seven days seems way too aggressive. There’s no paid tier to upgrade to that’ll increase the logging time. I’m not aware of any other web analytics platforms that stores historical data for less than sixty days (to compare the current to the past month’s traffic). Cloudflare hasn’t mentioned this limitation anywhere in its limited promotional material or documentation for its analytics service.

Cloudflare seems to have engineered its new service to comply with the General Data Protection Regulation (EU GDPR)’s “privacy by design” principle. Cloudflare claims its new service doesn’t use any persistent identifiers client-side (like cookies or localStorage) or immutable identifiers (like device fingerprinting or IP addresses).

I’m always skeptical whenever any company makes a broad claim about not collecting any personal information. Merely recording a visit to a webpage that serves information on a privacy-sensitive topic could be considered to be personal information. If you operate such a website, you should probably not count visits using an external service regardless of its privacy claims.

Rather uniquely among this type of service, Cloudflare claims it won’t store any part of your visitors’ IP addresses. However, the Cloudflare Privacy Policy says it may log the IP addresses of “end users,” so it’s a bit unclear what its actual policy is.

Cloudflare also promises not to use cookies, but the Web Analytics beacon script drops a cookie with a unique identifier when your browser downloads it. Cloudflare has announced plans to deprecate this cookie by for all its services. It feels like an oversight to launch a service and market it as cookie-free when it drops cookies in your visitors’ browsers.

Cloudflare Web Analytics uses a unique identifier per page-session called pageloadId. It’s a universally unique identifier (UUIDv4) that is generated locally each time a page is loaded. The identifier is used to merge multiple analytics data submissions for the same page view (I’ll get back to why it needs this later.) The identifier isn’t stored persistently and can’t be used for tracking. Random UUID collisions can theoretically happen when relying on the uncoordinated end-user to generate this identifier. A UUID-collision is far more likely to happen because of a session-replay attack against a visitor than by pure random chance. This opens interesting possibilities for Cloudflare to better detect this class of attacks across its network.

Cloudflare Web Analytics submits the data it has collected two times per page view. The two submissions contain almost identical data. It first sends it using the XMLHttpRequest API (XHR) when the page has loaded, and then a second time when the page is unloaded using either the more modern Beacon API or another XHR. Cloudflare merges the two submissions from the same page-session using the pageloadId identifier.

However, why is it submitting the same data twice? It’s certainly using the XHR API first to work around a limitation with the Beacon API in older versions of Safari. The first submission also helps pick up any browsers that may time out while submitting data while the page is being unloaded. I can only speculate, but I suspect that Cloudflare wants to track how long each visitor spent on each page in the future. Time-on-page is a key page-quality metric that I’ve focused on in my custom web analytics solution. The first and second submission contains a timestamp for when the page loaded, but the second submission doesn’t include the unload time.

Cloudflare waits for the standard pagehide and beforeunload events to send the second submission. As discussed before, using the beforeunload evicts the page from the history navigation cache in Chrome, Firefox, and Safari. Simply put, it means these browsers can’t cache rendered pages and has to reload them if a visitor uses the Back and Forward buttons in their browsers to navigate your website. Had Cloudflare not broken the history navigation cache, it would have persisted the pageloadId identifier and helped it merge duplicate visits from visitors using these two buttons. Cloudflare’s script seems to have been written with no awareness of the history navigation cache, though.

Cloudflare Web Analytics has an interesting way of counting the number of visitors. The industry-standard methods are to either count unique IP addresses or assigning each visitor an identifier and counting those. Instead, Cloudflare Web Analytics counts any page-views referred from another website or nowhere (using the Referer request header) as both a page view and a visitor. Internal-traffic on your website is referred from itself and is therefore only counted as a page view.

However, it’s unclear how it handles situations such as page reloads as this causes the browser to resubmit the Referer (sic) header and generate a new pageloadId identifier. It’s a difficult problem to solve without using a persistent identifier, and I found no evidence that Cloudflare has solved this problem. Part of the data that Cloudflare collects includes the navigation type, which can identify whether a page load was the result of clicking on a link or reloading the current page.

Cloudflare Web Analytics collects everything available from the browser Performance API, including precise and detailed timing information for individual assets such as images and scripts. Cloudflare doesn’t currently use this information for anything, but Cloudflare’s launch announcement mentioned that it intends to build-out page-load performance analytics over time. I’d appreciate seeing more client-side data-minification instead of having it pull in such detailed data about each page and how it’s loaded.

Update (): Cloudflare has finally begun using the all the collected performance data in its Web Analytics dashboard. It has focused on translating the raw measurements into the Google Core Web Vitals. You can learn more about it in the announcement blog post.

The Web Analytics beacon script is a 4,31 KB compressed download (10,27 KB uncompressed bytes to execute). It’s relatively light-weight compared to market-leader Google Analytics’ 19,01 KB compressed download (45,95 KB uncompressed). Cloudflare’s gains are somewhat undermined by not having a cache-policy. The script is still light-weight and shouldn’t negatively impact your page load performance. The script is loaded from static.cloudflareinsights.com and submits data to cloudflareinsights.com.

The script has been code-minified to make the download smaller, but Cloudflare doesn’t publish its source map. A source map is a special file that turns machine-readable minified code back into something that can be read by humans. Publishing source maps helps increase code and algorithmic transparency, and makes writing reviews like this one less time-consuming.

In my testing of the service, I found that Cloudflare Web Analytics misidentifies almost all page views as visits to / (the root path/home page). This completely undermines its ability to track which pages on your website receive the most visitors. Cloudflare’s script collects the correct path, so this seems to be a bug in how Cloudflare processes the submissions.

Every web browser except Chrome now has some anti-tracking blocklists built-in by default. These blocklists either restricts these services’ ability to set cookies (irrelevant for Cloudflare), or blocks the browser from communicating with them. I doubt Cloudflare’s privacy stance will make much of a difference with anti-tracking and ad-blocking browser extensions and similar services. The Cloudflare Web Analytics domain will likely (and undeservedly) end up on the same blocklists as trackers that are a hundred times worse than Cloudflare when it comes to respecting people’s privacy. The goal of the anti-trackers is to block as much tracking as possible, with no regard for promoting or allow-listing less privacy-invasive alternatives.

Cloudflare meets the requirements and in the Electronic Frontier Foundation’s (EFF) Do Not Track (DNT) Policy (even for non-DHT-signaling visitors). It could get its Web Analytics script allow-listed with the roughly 2,8 million users of the EFF’s Privacy Badger extension by publicly committing to the EFF’s anti-tracking policy. All it would take is to republish the policy document verbatim on its *.cloudflareinsights.com domains.

At the close, I still have this mantra playing on repeat in the back of my mind: “if you’re not paying for a service then you’re the product.” How come Cloudflare is offering this service for free? It surely cost money to maintain, develop, and operate the service. Some website owners might be convinced by performance data showing that their slow websites can be made faster by moving to Cloudflare’s enterprise products. I doubt it will lead to an enormous number of sales, however.

I guess that what Cloudflare wants out of its service is the page-loading performance data. This data has previously only been available to Google, which gets it through its free Chrome web browser. Cloudflare’s business is delivering websites as fast as technically possible. The Web Analytics service will help it gather extensive and precise “business intelligence” about what causes slow loading times on real-world websites on real-world devices for real-world visitors.

Cloudflare Web Analytics is a good option for website owners that want just the essential information about how many visit their website and which pages are popular (assuming it fixes the page-path bug). It’s not for you if you want to track your success over time and trends. The service has plenty of room for improvement, but I think that Cloudflare is off to a good start.

Related reading