An origin server should use the same hostname as any [caching] reverse-proxy gateways that are made available to the public web. Otherwise, you can end up leaking the hostname or IP address of the origin server, and get unexpected results in error pages and interpreted output.
The solution discussed in this article is specifically for the Apache Web Server’s mod_proxy module, but the concepts apply to other reverse proxy servers such as Nginx and Traffic Server as well. The configuration discussed in this article can be applied on a Server or VirtualHost block in your Apache configuration file.
A tale of two hosts responding to different names
Let us assume you’ve got a reverse proxy setup in front of your backend origin server. Both are running Apache, and the reverse proxy gateway responds to the hostnames example.com and www.example.com, and the backend origin server responds to wphost1.example.com. A simple configuration for this could look like:
In this setup, requests to the server at www.example.com would be transparently relayed to the server at wphost1.example.com. This simple setup is the starting point for a caching proxy that can work as a load balancing and geographical distribution of server load — see mod_proxy documentation for details.
Incoming requests to the proxy will arrive with the HTTP Host header set to either Host: example.com or Host: www.example.com. The relayed requests sent from the proxy to the origin server will be rewritten as Host: wphost1.example.com, and also gets a second header called X-Forwarded-Host: www.example.com.
The origin server is thus not aware that the preferred hostname is www.example.com, and the request will be processed as it was sent to wphost1.example.com instead. Error pages and interpreters (like PHP) will be configured to use the processed hostname and be unaware of the X-Forwarded-Host header.
Same name, an alias, and name canonicalization
Let us next look at a bit more complex setup where both the proxy and origin will treat requests as they were sent to www.example.com:
The configuration options are named kind of self-explanatory, but I’ll go through each of them, in turn, to explain how they work together.
The reverse proxy server is now aware that the ServerName is the canonical name of the server and all requests, even to the server’s IP, is to be treated as if they were made to the canonical name. This behavior is managed by the UseCanonicalName option.
Next, the ProxyPreserveHost option tells Apache to not use the proxy server’s hostname in requests to the proxy, but rather to make a connection to that host and then relay the original hostname in the HTTP Host header. The UseCanonicalName option rewrites the incoming Host header to its canonical version before relaying, meaning that the example.com alias, is treated as www.example.com as well.
The origin server is altered similarly, setting the actual hostname as an alias and the canonical name to the public-facing hostname. The UseCanonicalName option is used to ensure error pages and interpreters will always use the preferred hostname.
Alternatively, fix it on the origin
You can always fix this on the origin by rewriting all incoming requests so that they appear to have the correct Host header.
This approach can cause issues for the rewrite module, caching module, GnuTLS module, as well as other modules that work with the actual hostname submitted to the server. This requires a lot of testing to verify that it doesn’t cause any problems.
Leaking information about the origin server
A blog post by Chris Knight reminded me of the importance of hiding the hostname and IP of the origin server. He wrote about it in the context of Cloudflare, which is a commercial caching reverse-proxy service provider, but the principle also applies to non-Cloudflare proxies.
E.g. if this website came under attack, I could turn off the forward-facing reverse proxies that handle incoming requests and change my DNS to hide behind a commercial proxy service. For this to work, it’s important to keep any potential attack directed at the proxy servers so that the origin server can remain relatively isolated. Chris Knight goes through other measures you can take to hide your origin server in his blog post.
Apache’s mod_cache will by default expose the origin server’s IP or hostname on any error page. It’s thus important to not serve any error pages from the origin server with its actual hostname exposed. Hostname canonicalization, as discussed above, can hide the hostname on any 4xx client errors. However, 5xx server errors can possibly leak the IP address as these kinds of errors can appear when Apache isn’t correctly configured and your options thus isn’t in effect.
You can mitigate this somewhat if you use mod_cache with mod_proxy on your reverse proxies. By enabling the CacheStaleOnError on option, Apache will serve cached variants of requests if the origin server responds with a 5xx error code.