The AppleNewsBot fetches syndication feeds, webpages, and images for the Apple News service. Unfortunately, some sloppy programming in the bot turns it into an unintentional web server load testing tool. Here is what goes wrong when the bot visits websites that use encryption certificates from the Let’s Encrypt certificate authority.
The brief summary of what happens is as follows:
- AppleNewsBot requests http://example.com/robots.txt
- The server returns a redirect to http://example.com/robots.txt
- AppleNewsBot then requests https://example.com/robots.txt as instructed, but doesn’t recognize Let’s Encrypt’s root certificate as valid and breaks the connection. This isn’t recognized correctly as a fail condition, and the bot loops back to step 1.
This results in 3–5 requests from the bot every second continuously. After a few hours, another AppleNewsBot bot visits the website as your website isn’t marked as having been updated in a while. The second bot joins the first and you now have 6–10 requests every second. Every few hours thereafter, another AppleNewsBot will join in with the others.
This repeats until you’ve a small swarm of 32 AppleNewsBots (seems to arbitrarily be the maximum number of bots Apple will send at the same time) sending a total of 96–160 requests per second. By this time, you’ll have used up all the request serving capacity of a cheap and underpowered server. For my poor little server, this exhausted ¾ of the server’s capacity and made it reduce performance when fulfilling requests from other visitors.
I swapped out the certificate from Let’s Encrypt temporarily with another certificate and the swarm of AppleNewsBots successfully grabbed the files they were after and then left. I’ve also talked to and confirmed this issue with other webmasters who all have had the same problem and all have had Let’s Encrypt certificates on their websites.
I’ve contacted Apple regarding the issue, and have been reassured that their engineers are working on the problem. However, it has been three months since I first contacted them and the issue still persists. I’ve tried contacting them after the first correspondence, but haven’t received a reply since the first email exchange.
As Apple wouldn’t resolve the issue(s) with their bot, I needed to start blocking them to free up resources for actual visitors. Initially, I repurposed the blocking repeating 403 requests solution using Fail2Ban to target excessive 301 Permanently Moved redirects using the same method. Despite that the Fail2Ban ban-action only needed to keep track of redirects for 10 seconds to identify and block the badly-behaved bots from Apple, it leads to quote the increase in Fail2Ban’s memory usage as there are far more legitimate 301 redirects than there are 403 requests in normal operation.
I ended up blocking Apple’s 18.104.22.168/16 IP range, which is where all their bot traffic have originated from. If you’re experiencing badly behaved or aggressive AppleBot or AppleNewsBot traffic on your website, I recommend blocking the IP ranges directly. Normally, you could have created entries for AppleBot and AppleNewsBot in your /robots.txt file, but as its the very act of retrieving this file that triggers the bad behavior, this doesn’t work.
Reader Joel Risberg suggested another approach where the robots file would be served rather than redirected. The problem with that solution is that the problem will then resurface when Apple’s bots start requesting other resources from the server. It can also lead to URL canonicalization issues with other bots when the robots file is served from a different origin than the rest of the website.
If you want to work-around the issue with AppleNewsBot, you can stop the redirect from HTTP to HTTPS entirely when the User-Agent matches “AppleNewsBot”. This will also require you to change all the links in the syndication feed to be plain HTTP rather than HTTPS for the AppleNewsBot User-Agent. This may require a lot of setup depending on your environment, so I would recommend either getting a certificate from a different authority or just blocking AppleNewsBot outright for the time being.
If you’re using certificates from a relatively new or small certificate authority, be aware that some online services and bots may misbehave. It’s the price you pay for being an early adopter.
Oh, and lastly I’d like to mention that AppleNewsBot sends If-Modified-Since as Unix timestamps rather than the RFC 1123 date format as required in HTTP. In other words, their sloppy implementation is so bad that they’ll always receive cache-misses from every server every time. Good job, Apple. Just terrific work on this one.