Sorcerer's Tower

Getting the Original URL in Apache

There are various situations where one might want to know the full URL sent over HTTP by the user agent, before any rewriting has occurring.

Depending on the situation and setup, it can be as simple as using CGI variables such as path_info, redirect_url or request_uri, and within a JVM servlet getRequestUrl() may prove useful - but none of those are guaranteed to be the URL which Apache received, nor are any of Apache's other documented variables.

Fortunately there is a workaround, because one variable provided is the first line of the HTTP request, which contains the desired request URL nestled between the method and protocol, i.e: "GET /url HTTP/1.1" - meaning all that needs doing is to chop the ends off.

It is relatively simple to extract the URL, and at the same time provide it to later scripts, by using the RequestHeader directive from mod_headers to set and modify a header, like so:

RequestHeader set X-Original-URL "expr=%{THE_REQUEST}"
RequestHeader edit* X-Original-URL ^[A-Z]+\s|\sHTTP/1\.\d$ ""

The first line creates a header named X-Original-URL with the full value of the variable.

The second line performs a regex replace on the specified header, matching both the request method and its trailing space (^[A-Z]+\s) then the protocol plus its preceding space (\sHTTP/1\.\d$) and replacing with an empty string to leave just the URL.

The * after edit is what makes the replace occur multiple times - without it only the first match would be replaced. (i.e. the * is equivalent to a g/Global flag.)

The name X-Original-URL is used for compatibility with the equivalent header set by the IIS URL Rewrite Module - both that module and the above solution provide the full request URL, including query string, and encoded in whatever manner the user agent sent, but one difference is that the above config always sets the header, whilst the IIS version only sets it when the URL has been rewritten.