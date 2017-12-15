The web is an open place where all sorts of broken client implementations can roam free. In this article, I’ll walk you through how you can help out clients (mostly feed readers and bots) that don’t know that URL #fragments should be dropped from request target addresses.

The fragment part of an URL, that is everything after the first # character, should be omitted when forming a valid request target in HTTP. The request target is the page address that a HTTP client requests from a web server. However, going through my server logs I found a noticeable number of syndication feed readers and bots that have either percent-encoded the fragment portion of URLs or even sent it as-is in their GET requests.

These requests end up with 404 Not Found error as an HTTP server will interpret the fragment to be part of the URL’s path. It’s not Hurricane Katrina, but it’s still a problem. I’ve sent some patches around to fix some of the broken implementations, but I can’t possibly fix all of them. You can still recover traffic from clients and bots that make this mistake using some redirect magic.

The following Apache web server configuration example finds the first literal or URL-escaped # character, drops it and anything after it from the URL, and finally redirects the client to the corrected address with the fragment removed.

RewriteEngine On RewriteCond %{REQUEST_URI} " (.*?) (#|\%23)" RewriteRule "(.*)" " %1 " [last,redirect=302]

Notably, this redirect entirely discards the fragment portion from the URL. By triggering this rewrite rule, a client has already demonstrated that its incapable of properly handling URL fragments.

I’ve had no luck constructing a similar redirect rule for NGINX. It seems NGINX, understandably, refuses to believe a client would ever submit the fragment portion of an URL as part of a request and just refuse to match against it.

