Comparing DNS caching/TTL of content delivery networks

Choose the right content delivery network (CDN) can help your visitors avoid having to pay a performance cost for something that almost never happens.

The time-to-live (TTL) is the number of seconds that client and intermediary DNS resolvers can cache and reuse the same DNS responses without having to make another lookup. Longer cache times means fewer repeated (and most of the time redundant) lookups and fewer lookups means better performance.

I’ve compared budget CDN providers before, and you can find lots of single-connection performance testing metrics for every CDN provider on the market. However, I couldn’t find any published metrics comparing the DNS time-to-live values used by budget as well as some market leading CDN providers.

I’ve collected the TTL (in seconds) returned by a selection of the most popular and some budget CDN providers as of :

Name Time-to-live
BelugaCDN 600
BunnyCDN 35
CDN77 1
CDNSun 180
Cloudflare 300
Fastly 30
KeyCDN 60
Incapsula 30
Microsoft Azure 60

As I explored when looking into client side DNS caching, web browsers can be assumed to cache a DNS query for a minimum of 15 seconds. Most will cache it for one to two minutes, however. The minimum TTL should ideally last at least as long as most visitors’ sessions; ensuring that they won’t need to perform more than one DNS lookup for the duration of their visit to your website.

When deploying a regular pull CDN (a reverse-proxy), website owners redirect their DNS using a CNAME record that points to another domain operated by the CDN provider. Websites can optimize their end of the lookup process (the CNAME record) by making sure it has a long TTL. However, the domain returned by the CNAME is controlled by the CDN provider which returns the actual A and AAAA records and sets their TTL times.

The TTL may not only impact secondary requests but also the initial load time as it’s more likely that a given DNS cache will already have cached DNS records from before with longer TTLs. A quick cost–benefit analysis of increasing TTL from half a minute to ten minutes you risk up to ten minute downtime in rare circumstances but gain a 100–400 ms reduction in everyday page load times.

So why do most providers set so short TTLs? Their service level agreements may guarantee a certain level of up-time, they may not trust their own infrastructure’s ability to withstand a distributed denial of service (DDoS) attack, they may have an over-reliance on DNS based load balancing, or they may not care that much about performance. In other words, an unbelievably good service-level agreement (SLA) may come at the expense of network performance.

A TTL lower than at least a couple of minutes should be a red flag in terms of performance and the expected reliability of the service provider’s offerings.

"Lots and lots of websites and Internet services use pretty short lifetimes on their DNS records. And they do this because if a data center goes down, they want to be able to update the DNS and very rapidly direct traffic to a different data center.

The problem with this approach is you’re paying a performance cost for something that almost never happens. Data centers very rarely go down."

― Stuart Cheshire, Session 714, Apple WWDC 2018