Initially, the purpose of this module (as the name suggests) is to perform URL rewriting. This mechanism allows you to get rid of ugly URLs containing multiple parameters, for instance, http://example.com/article. php?id=1234&comment=32 — such URLs being particularly uninformative and meaningless for a regular visitor. Instead, links to your website will contain useful information that indicate the nature of the page you are about to visit. The URL given in the example becomes http://website.com/article-1234-32-USeconomy-strengthens.html. This solution is not only more interesting for your visitors, but also for search engines — URL rewriting is a key element to Search Engine Optimization (SEO).
The principle behind this mechanism is simple — it consists of rewriting the URI of the client request after it is received, before serving the file. Once rewritten, the URI is matched against location blocks in order to find the configuration that should be applied to the request. The technique is further detailed in the coming sections.
Reminder on Regular Expressions
First and foremost, this module requires a certain understanding of regular expressions, also known as regexes or regexps. Indeed, URL rewriting is performed by the rewrite directive, which accepts a pattern followed by the replacement URI.
Purpose
The first question we must answer is: What‘s the purpose of regular expressions? To put it simply, the main purpose is to verify that a string matches a pattern. The said pattern is written in a particular language that allows defining extremely complex and accurate rules.
String | Pattern | Matches? | Explanation |
hello | ^hello$ | Yes | The string begins by character h (^h), followed by e, l, l, and then finishes by o (o$). |
hell | ^hello$ | No | The string begins by character h (^h), followed by e, l, l but does not finish by o. |
Hello | ^hello$ | Depends | If the engine performing the match is casesensitive, the string doesn‘t match the pattern. |
This concept becomes a lot more interesting when complex patterns are employed, such as one that validate an e-mail addresses: ^[A-Z0-9._%+-][email protected][A-Z0-9.-]+\.[A-Z]{2,4}$. Validating the well-forming of an e-mail address programmatically would require a great deal of code, while all of the work can be done with a single regular expression pattern matching.
PCRE Syntax
The syntax that Nginx employs originates from the Perl Compatible Regular Expression (PCRE) library. It‘s the most commonly used form of regular expression, and nearly everything you learn here remains valid for other language variations.
In its simplest form, a pattern is composed of one character, for example, x. We can match strings against this pattern. Does example match the pattern x? Yes, example contains the character x. It can be more than one specific character — the pattern [a-z] matches any character between a and z, or even a combination of letters and digits: [a-z0-9]. In consequence, the pattern hell[a-z0-9] validates the following strings: hello and hell4, but not hell or hell!.
You probably noticed that we employed the characters [ and ]. These are called metacharacters and have a special effect on the pattern. There are a total of 11 metacharacters, and all play a different role. If you want to actually create a pattern containing one of these characters, you need to escape them with the \ character.
Metacharacter | Description |
^ Beginning |
The entity after this character must be found at the beginning. Example pattern: ^h Matching strings: hello, h, hh Non-matching strings: character, ssh |
$ End |
The entity before this character must be found at the end. Example pattern: e$ Matching strings: sample, e, file Non-matching strings: extra, shell |
. Any |
Matches any character. Example pattern: hell. Matching strings: hello, hellx, hell5, hell! Non-matching strings: hell, helo |
[ ] Set |
Matches any character within the specified set. Syntax: [a-z] for a range, [abcd] for a set, and [a-z0-9] for two ranges. Note that if you want to include the – character in a range, you need to insert it right after the [ or just before the ]. Example pattern: hell[a-y123-] Matching strings: hello, hell1, hell2, hell3, hell Non-matching strings: hellz, hell4, heloo, he-llo |
[^ ] Negate set |
Matches any character that is not within the specified set. Example pattern: hell[^a-np-z0-9] Matching strings: hello, hell; Non-matching strings: hella, hell5 |
| Alternation |
Matches the entity placed either before or after the |. Example pattern: hello|welcome Matching strings: hello, welcome, helloes, awelcome Non-matching strings: hell, ellow, owelcom |
( ) Grouping |
Groups a set of entities, often to be used in conjunction with |. Example pattern: ^(hello|hi) there$ Matching strings: hello there, hi there. Non-matching strings: hey there, ahoy there |
\ Escape |
Allows you to escape special characters. Example pattern: Hello\. Matching strings: Hello., Hello. How are you?, Hi! Hello... Non-matching strings: Hello, Hello, how are you? |
Quantifiers
So far, you are able to express simple patterns with a limited number of characters. Quantifiers allow you to extend the amount of accepted entities:
Quantifier | Description |
* 0 or more times |
The entity preceding * must be found 0 or more times. Example pattern: he*llo Matching strings: hllo, hello, heeeello Non-matching strings: hallo, ello |
+ 1 or more times |
The entity preceding + must be found 1 or more times. Example pattern: he+llo Matching strings: hello, heeeello Non-matching strings: hllo, helo |
? 0 or 1 time |
The entity preceding ? must be found 0 or 1 time. Example pattern: he?llo Matching strings: hello, hllo Non-matching strings: heello, heeeello |
{x} x times |
The entity preceding {x} must be found x times. Example pattern: he{3}llo Matching strings: heeello, oh heeello there! Non-matching strings: hello, heello, heeeello |
{x,} At least x times |
The entity preceding {x,} must be found at least x times. Example pattern: he{3,}llo Matching strings: heeello, heeeeeeello Non-matching strings: hllo, hello, heello |
{x,y} x to y times |
The entity preceding {x,y} must be found between x and y times. Example pattern: he{2,4}llo Matching strings: heello, heeello, heeeello Non-matching strings: hello, heeeeello |
As you probably noticed, the { and } characters in the regular expressions conflict with the block delimiter of the Nginx configuration file syntax language. If you want to write a regular expression pattern that includes curly brackets, you need to place the pattern between quotes (single or double quotes):
rewrite hel{2,}o /hello.php; # invalid
rewrite "hel{2,}o" /hello.php; # valid
rewrite ‘hel{2,}o‘ /hello.php; # valid
Captures
One last feature of the regular expression mechanism is the ability to capture sub-expressions. Whatever text is placed between parentheses ( ) is captured and can be used after the matching process.
Here are a couple of examples to illustrate the principle:
Pattern | String | Captured |
^(hello|hi) (sir|mister)$ | hello sir |
$1 = hello $2 = sir |
^(hello (sir))$ | hello sir |
$1 = hello sir $2 = sir |
^(.*)$ | nginx rocks | $1 = nginx rocks |
^(.{1,3})([0-9]{1,4})([?!]{1,2})$ | abc1234!? |
$1 = abc $2 = 1234 $3 = !? |
Named captures are also supported: ^/(?<folder>[^/]*)/(?<file>.*)$ | /admin/doc $folder = admin | $file = doc |
When you use a regular expression in Nginx, for example, in the context of a location block, the buffers that you capture can be employed in later directives:
server {
server_name website.com;
location ~* ^/(downloads|files)/(.*)$ {
add_header Capture1 $1;
add_header Capture2 $2;
}
}
In the preceding example, the location block will match the request URI against a regular expression. A couple of URIs that would apply here: /downloads/file.txt, /files/archive.zip, or even /files/docs/report.doc. Two parts are captured: $1 will contain either downloads or files and $2 will contain whatever comes after /downloads/ or /files/. Note that the add_header directive is employed here to append arbitrary headers to the client response for the sole purpose of demonstration.
Internal requests
Nginx differentiates external and internal requests. External requests directly originate from the client; the URI is then matched against possible location blocks: