Apache bug leads to websites dropping out of Yandex search

An innocent looking implementation bug in Apache’s new HTTP/2 module (mod_http2) incorrectly announces HTTP/2 support using the Upgrade: h2 header. This simple mistake makes websites inaccessible to YandexBot as well as services built on NodeJS.

Apache 2.4.16 introduced a new module for HTTP/2 support (mod_http2.) On servers where this module is enabled, Apache will issue an extra “Upgrade: h2HTTP header to advertise HTTP/2 support. The only problem is that this header is intended as a request header from clients and not a response header. The non-standard use of the request header sent as a response header causes issues for some HTTP libraries such as those used by Yandex and services built on NodeJS. I contacted Yandex Support and they confirmed the issue.

Yandex is a search engine that controls 45 % of the Russian and 20 % of the Ukrainian search market. When their YandexBot fails to retrieve pages from your site more than a handful of times, your site is dropped from search results. The problem manifest itself in Yandex.Webmaster tools as “Extraction error: Your server is configured to transfer compressed data using gzip or deflate. The compressed file has been corrupted and can’t be unpacked by our indexing robot.” The error message is entirely misleading, as the real issue is something else entirely.

The problem is tracked in the Apache project as bug #59311 and it has been resolved on the master branch. The fix will be included in the next major release. However, it has also been proposed back-ported to their next maintenance release, Apache 2.4.21, which is expected in the beginning of July.

Until the official fix is released, you can still work around the issue in Apache configuration by entirely removing the problematic header. This shouldn’t have any adverse side effects and wouldn’t break HTTP/2 nor HTTP/1.1 support on your server.

Header unset Upgrade

The problem currently affects the 0,04 % of the top one million websites as of 2016-05-19. Affected websites were determined by crawling the Alexa Top 1 Million websites and looking in their response headers. This measurement might not be representative as any affected website would see less traffic and possibly drop out of the top one million list as a result.

Affects Apache versions 2.4.16 released in October 2015 and onwards to 2.4.17, 2.4.18, and 2.4.20. These versions will continue to be in circulation for quite some time due to the slow nature of server updates and Linux distribution package update turn-over times in many popular Linux distributions. HTTP/2 isn’t enabled by default so this issue only shows up when the mod_http2 module is enabled with the Protocols h2 http/1.1 directive.

I’ve mentioned this bug before in my review of three uptime monitoring services. However, at the time I didn’t consider it to be much of an issue. When I gradually started loosing all traffic from Russia (primarily referred from Yandex Search), I’d to take action and apply the previously mentioned work-around. As usual, getting reindexed is much harder than getting deindexed.

If you’re running an affected version, either consider upgrading or deploying the aforementioned workaround to ensure other services can talk with your Apache web server.