Fun with Apache HTTPD and subrequest variable inconsistencies

The Apache HTTP Server (HTTPD) is extremely configurable and powerful. However, its modules are a collection of disparate and inconsistent tools. In this article, I’d walk you through a recent problem I encountered when I wanted to combine two modules.

This article assumes some basic familiarity with the HyperText Transfer Protocol (HTTP) requests and the HTTPD server configuration syntax.

The problem I wanted to solve: The REQUEST_URI variable is supposed to contain the request path as it was supplied in the HTTP request (unless it has been rewritten by mod_rewrite). However, it was suffixed with “index.html” in mod_headers when requesting a directory (such as /).

Early on I realized that the value in the suffix came from the DirectoryIndex directive. When you request a directory, HTTPD tries to fulfill it using an internal subrequest that fetches the file specified in that directive. I understood the underlying problem but not how to retrieve the original REQUEST_URI variable.

I tried solving this using mod_setenvif. This was the wrong approach, but I want to walk you through it:

DirectoryIndex /index.html
# check if the requested file exists
<If "-f '%{DOCUMENT_ROOT}%{REQUEST_URI}'">
  # copy the REQUEST_URI variable to PATH
  SetEnvIf REQUEST_URI .* PATH=$0
  Header set Test1 %{REQUEST_URI}e
  Header set Test2 %{PATH}e
</If>

When requesting /, the Test1 response header was set to /index.html and Test2 returned (null). Neither of these was the expected value (/).

A value of null means that the variable hasn’t being set (as opposed to being empty). The code worked as expected when requesting files but fails when requesting directories.

The inconsistencies in how mod_setenvif and mod_headers handle the REQUEST_URI are frustrating. It makes it difficult to use the former module to copy and correct the variable for use with the latter.

If you’re a seasoned HTTPD admin — or an attentive reader — you may have spotted the problem or at least noticed that something is wrong. I’m expecting a response header to appear when requesting directories but the header is gated by a conditional that only acts when the request is a file! But wait — why is the response header being sent at all?

There’s a lot of different moving parts here. I’ve actually explained what’s gone wrong already, but I’ll walk you through it step by step.

First of all, the conditional block isn’t met for the initial request. It’s for a directory so no headers are added. However, to fulfill the request, HTTPD issues a subrequest to request the DirectoryIndex file. This subrequest fulfills the requirement for running the conditional block and the headers are appended to the subrequest. The headers are attached to the response to the initial request.

So, why are the two test headers different, though? The second paragraph in the mod_setenvif documentation has the answer:

When the server looks up a path via an internal subrequest such as looking for a DirectoryIndex or generating a directory listing with mod_autoindex, per-request environment variables are not inherited in the subrequest. Additionally, SetEnvIf directives are not separately evaluated in the subrequest due to the API phases mod_setenvif takes action in.

Knowing all of this, we can construct a better workaround to achieve the desired goal. The three differences from the above code example are marked in blue.

DirectoryIndex /index.html
# check if the requested file or directory exists
<If "-e '%{DOCUMENT_ROOT}%{REQUEST_URI}'">
  # copy the REQUEST_URI variable to PATH
  SetEnvIf REQUEST_URI .* PATH=$0
  # set header only if the variable PATH is set
  Header set Test1 %{REQUEST_URI}e ENV=PATH
  Header set Test2 %{PATH}e ENV=PATH
</If>

This time around when requesting / we get different responses. The Test1 response header still responds with /index.html. The Test2 response header is now / as desired!

So, what is different this time around? The conditional is run for both the initial request (for the directory) and the subrequest for the DirectoryIndex file. The PATH variable is set correctly for the initial request, and it isn’t overwritten by the subrequest because the PATH variable isn’t being set there. SetEnvIf’s conditional (REQUEST_URI) is null so the subrequest won’t overwrite the value set for the initial request.

I said I wanted the REQUEST_URI variable without the index.html suffix. However, the above code example still include it when explicitly requesting /index.html. You can remove it from the variable in this situation using regular expressions (RegEx). Replace the SetEnvIf line from the above code example with the following (or adjust to your DirectoryIndex directive as needed):

SetEnvIf REQUEST_URI ^(.*?)(index\.html)?$ PATH=$1

There are many types of subrequests, though. You’ll be running into the same problem again if you’re setting custom error pages with the ErrorDocument directive. The REQUEST_URI variable will disclose the location of the error document. You can work around this error specifically by checking the request status code before sending the response header.

The following example combines the above method (checking to see if the variable is empty) and a check to see if the request was successful (that it generated a HTTP 200 OK successfully responses):

Header set Test2 %{PATH}e "expr=%{REQUEST_STATUS} == 200 && -n %{ENV:PATH}"

These aren’t the only cases where you could run into subrequests overwriting variables. It should be the most common ones, however. You must read HTTPD module documentation top-to-bottom and read all the associated documentation. All the information you’ll need is [most often] there. It can be a pain to configure it sometimes because it doesn’t behave as you think it does. All the inconsistencies between modules make it hard to apply things you’ve learned about one module on another.

Sources