实操 Web Cache
http://netkiller.github.io/journal/cache.html
Mr. Neo Chen (陈景峰), netkiller, BG7NYT
中国广东省深圳市龙华新区民治街道溪山美地
518131
+86 13113668890
+86 755 29812080
<[email protected]>
$Id
版权声明
转载请与作者联系,转载时请务必标明文章原始出处和作者信息及本声明。
|
|
2015-08-27
摘要
写这篇文章的原因,是我看到网上很多谈这类的文章,多是人云亦云,不求实事,误导读者。
下面文中我会一个一个做实验,并展示给你,说明为什么会这样。只有自己亲自尝试才能拿出有说服力的真凭实据。
2014-03-12 首次发布
2015-08-27 修改,增加特殊数据缓存
我的系列文档
目录
- 1. 测试环境
- 2. 文件修改日期 If-Modified-Since / Last-Modified
- 3. ETag / If-None-Match
- 4. Expires / Cache-Control
- 5. FastCGI 缓存相关
- 6. HTML META 与 Cache
- 7. gzip
- 8. 反向代理与缓存
- 9. 特殊数据缓存
- 10. 总结
1. 测试环境
CentOS 6.5
Nginx安装脚本 https://github.com/oscm/shell/blob/master/nginx/nginx.sh
php安装脚本 https://github.com/oscm/shell/blob/master/php/5.5.8.sh
2. 文件修改日期 If-Modified-Since / Last-Modified
If-Modified-Since 小于 Last-Modified 返回 HTTP/1.1 200 OK, 否则返回 HTTP/1.0 304 Not Modified
每次浏览器请求文件会携带 If-Modified-Since 头,将当前时间发送给服务器,与服务器的Last-Modified时间对对比,如果大于Last-Modified时间,返回HTTP/1.0 304 Not Modified不会重新打开文件,否则重新读取文件并返回内容
2.1. 静态文件
nginx/1.0.15 静态文件自动产生 Last-Modified 头
# nginx -v nginx version: nginx/1.0.15 # curl -I http://192.168.6.9/index.html HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 07:36:03 GMT Content-Type: text/html Content-Length: 6 Last-Modified: Thu, 27 Feb 2014 07:29:50 GMT Connection: keep-alive Accept-Ranges: bytes
图片文件
# curl -I http://192.168.6.9/image.png HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 07:37:18 GMT Content-Type: image/png Content-Length: 41516 Last-Modified: Thu, 27 Feb 2014 07:36:59 GMT Connection: keep-alive Accept-Ranges: bytes
提示
疑问 nginx/1.4.5 默认没有 Last-Modified
# nginx -v nginx version: nginx/1.4.5 # curl -I http://192.168.2.15/index.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:13:44 GMT Content-Type: text/html Connection: keep-alive
经过一番周折最终找到答案 Nginx 如果开启 ssi 会禁用Last-Modified 关闭 ssi 后输出如下
# curl -I http://localhost/index.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 05:44:29 GMT Content-Type: text/html Content-Length: 6 Last-Modified: Wed, 25 Dec 2013 03:18:16 GMT Connection: keep-alive ETag: "52ba4e78-6" Accept-Ranges: bytes
再测试一次
# curl -H "If-Modified-Since: Fir, 28 Feb 2014 07:42:55 GMT" -I http://192.168.2.15/test.html HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:34:54 GMT Last-Modified: Fri, 28 Feb 2014 01:55:50 GMT Connection: keep-alive ETag: "530feca6-8b"
测试结果成功返回 HTTP/1.1 304 Not Modified, 但又莫名其妙的出现了 ETag。 这就是Nignx本版差异,非常混乱。
既然出现了ETag我们也顺便测试一下
# curl -H ‘If-None-Match: "530feca6-8b"‘ -I http://192.168.2.15/test.html HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:39:18 GMT Last-Modified: Fri, 28 Feb 2014 01:55:50 GMT Connection: keep-alive ETag: "530feca6-8b"
也是成功的
测试图片
# curl -I http://localhost/logo.jpg HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:59:04 GMT Content-Type: image/jpeg Content-Length: 10103 Last-Modified: Fri, 28 Feb 2014 02:56:37 GMT Connection: keep-alive ETag: "530ffae5-2777" Accept-Ranges: bytes # curl -H ‘If-None-Match: "530ffae5-2777"‘ -I http://localhost/logo.jpg HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 03:03:33 GMT Last-Modified: Fri, 28 Feb 2014 02:56:37 GMT Connection: keep-alive ETag: "530ffae5-2777" # curl -H "If-Modified-Since: Fri, 28 Feb 2014 12:04:18 GMT" -I http://localhost/logo.jpg HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 03:04:45 GMT Content-Type: image/jpeg Content-Length: 10103 Last-Modified: Fri, 28 Feb 2014 02:56:37 GMT Connection: keep-alive ETag: "530ffae5-2777" Accept-Ranges: bytes
测试结果,ETag通过测试,If-Modified-Since无论如何也无法返回 304 可能还需要其他的HTTP头,浏览器测试都通过返回 HTTP/1.1 304 Not Modified
现在换成浏览器测试 Chrome Firefox成功, 因为浏览器不会主动发送If-Modified-Since, 浏览器只有发现Last-Modified后,第二次请求才会推送 If-Modified-Since 需要刷新两次页面。
2.1.1. if_modified_since
在开启ssi的情况下,通过参数 if_modified_since 可以开启 Last-Modified
server { listen 80; server_name 192.168.2.15; if_modified_since before; }
测试结果看不到 Last-Modified, 因为 Nginx 的 if_modified_since before;参数只有接收到浏览器发过来的If-Modified-Since头才会发送Last-Modified
# curl -I http://192.168.2.15/test.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:39:42 GMT Content-Type: text/html Connection: keep-alive
最终 if_modified_since before; 数没有起到作用
参数设置为 if_modified_since exact;
# curl -I http://192.168.2.15/test.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:45:40 GMT Content-Type: text/html Connection: keep-alive # curl -H ‘If-None-Match: "530feca6-8b"‘ -I http://192.168.2.15/test.html HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:45:44 GMT Last-Modified: Fri, 28 Feb 2014 01:55:50 GMT Connection: keep-alive ETag: "530feca6-8b" # curl -H "If-Modified-Since: Fir, 28 Feb 2014 07:42:55 GMT" -I http://192.168.2.15/test.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 02:45:50 GMT Content-Type: text/html Connection: keep-alive
测试失败,浏览器也是实测失败,ETag却成功
2.2. 通过rewrite伪静态处理
index.php仍然是上面的那个php文件,我们只是做了伪静态
location / { root /www; index index.html index.htm; rewrite ^/test.html$ /index.php last; }
现在我们分别通过curl有chrome/firefox进行测试
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 08:42:55 GMT" -I http://192.168.6.9/test.html HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 08:55:19 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT
经过测试无论是 curl 还是 chrome/firefox 均无法返回304.
下面是我的分析,仅供参考。用户请求index.html Nginx 会找到该文件读取 mtime 与 If-Modified-Since 匹配,如果If-Modified-Since大于 Last-Modified返回 304否则返回200.
为什么同样操作经过伪静态的test.html就不行呢? 我分析当用户请求test.html Nginx 首先做Rewrite处理,然后跳转到index.php 整个过程nginx 并没有访问实际物理文件test.html也就没有mtime, 所以Nginx 返回200.
如果 Nginx 按预想的返回304,nginx 需要读取程序返回的HTTP头,Nginx 并没有这样的处理逻辑。
2.3. 动态文件
动态文件没有 Last-Modified 头,我们可以伪造一个
# curl -I http://192.168.6.9/index.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 07:57:59 GMT Content-Type: text/html Connection: keep-alive
在程序中加入HTTP头推送操作,Last-Modified时间是27号,当前时间是28号,我们要让Last-Modified 小于当前时间才行。
# cat index.php <?php header(‘Last-Modified: Thu, 27 Feb 2014 08:39:35 GMT‘ ); //header(‘Last-Modified: ‘ .gmdate(‘D, d M Y H:i:s‘) . ‘ GMT‘ ); ?> Hello
现在你将看到 Last-Modified
# curl -I http://localhost/modified.php HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 05:59:28 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT
注意
虽然我们让动态程序返回了 Last-Modified ,但浏览器不认,经过测试 Chrome / Firefox 均不会承认.php文件,并缓存其内容。
# curl -I http://localhost/modified.php HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 05:59:28 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT # curl -H "If-Modified-Since: Fri, 28 Feb 2014 08:42:55 GMT" -I http://localhost/modified.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Fri, 28 Feb 2014 05:32:30 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT
Last-Modified 对动态程序来说没有起到实际作用
Last-Modified是程序产生的,Nginx无法读到,让程序去处理状态返回是可行的,下面我们修改程序如下。
# cat modified.php <?php $mtime = ‘Fri, 28 Feb 2014 12:04:18 GMT‘; cache($mtime); function cache($mtime) { $http_if_modified_since = null; if(array_key_exists (‘HTTP_IF_MODIFIED_SINCE‘,$_SERVER)){ $http_if_modified_since = $_SERVER[‘HTTP_IF_MODIFIED_SINCE‘]; } echo $http_if_modified_since; if ($http_if_modified_since >= $mtime) { header(‘Last-Modified: ‘.$mtime, true, 304); exit; } else { header(‘Last-Modified: ‘ . $mtime ); } } print_r($_SERVER); echo date("Y-m-d H:i:s"); ?>
测试效果
# curl -I http://localhost/modified.php HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 05:22:28 GMT Content-Type: text/html Connection: keep-alive
伪造一个 If-Modified-Since 日期小于我们指定的日期程序返回HTTP/1.1 200 OK
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 10:04:18 GMT" -I http://localhost/modified.php HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 05:22:13 GMT Content-Type: text/html Connection: keep-alive
伪造一个 If-Modified-Since 日期大于我们指定的日期程序返回HTTP/1.1 304 Not Modified
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 20:04:18 GMT" -I http://localhost/modified.php HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 05:21:31 GMT Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 12:04:18 GMT
测试成功,并且在浏览器端也测试成功 HTTP/1.1 304 Not Modified
将modified.php伪静态处理
location / { root /www; index index.html index.htm; rewrite ^/modified.html$ /modified.php last; }
测试
# curl -I http://localhost/modified.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 06:21:10 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT # curl -H "If-Modified-Since: Fri, 28 Feb 2014 12:04:18 GMT" -I http://localhost/modified.html HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 06:21:22 GMT Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT
达到预期效果
3. ETag / If-None-Match
上面的Last-Modified测试中发现ETag虽然不限制,但是暗中还是可用的:)
etag on; 开启Nginx etag支持,lighttpd 默认开启
server { listen 80; server_name phalcon; charset utf-8; access_log /var/log/nginx/host.access.log main; etag on; location / { root /www/phalcon/public; index index.html index.php; } }
检查ETag输出
# curl -I http://localhost/index.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 03:08:28 GMT Content-Type: text/html Connection: keep-alive # curl -I http://phalcon/img/css.png HTTP/1.1 200 OK Server: nginx Date: Thu, 27 Feb 2014 09:20:49 GMT Content-Type: image/png Content-Length: 1133 Last-Modified: Fri, 14 Feb 2014 08:05:03 GMT Connection: keep-alive ETag: "52fdce2f-46d" Accept-Ranges: bytes3
即使你开启了 ETag Nginx 对 HTML、CSS文件也不做处理。最终在一个外国网站是找到一个nginx-static-etags模块,有兴趣自己尝试,这里就不讲了。
3.1. 静态文件
首先查询etag值
# curl -I http://phalcon/img/css.png HTTP/1.1 200 OK Server: nginx Date: Thu, 27 Feb 2014 09:25:41 GMT Content-Type: image/png Content-Length: 1133 Last-Modified: Fri, 14 Feb 2014 08:05:03 GMT Connection: keep-alive ETag: "52fdce2f-46d" Accept-Ranges: bytes
然后向服务器发送If-None-Match HTTP头
# curl -H ‘If-None-Match: "52fdce2f-46d"‘ -I http://phalcon/img/css.png HTTP/1.1 304 Not Modified Server: nginx Date: Thu, 27 Feb 2014 09:25:44 GMT Last-Modified: Fri, 14 Feb 2014 08:05:03 GMT Connection: keep-alive ETag: "52fdce2f-46d"
这次比较顺利,成功返回HTTP/1.1 304 Not Modified
3.2. 动态程序
默认情况输出如下
# curl -I http://192.168.6.9/index.php HTTP/1.1 200 OK Server: nginx Date: Thu, 27 Feb 2014 09:29:13 GMT Content-Type: text/html; charset=utf-8 Connection: keep-alive
测试程序
<?php header(‘Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT‘ ); header(‘Etag: "abcdefg"‘); #header(‘Last-Modified: ‘ .gmdate(‘D, d M Y H:i:s‘) . ‘ GMT‘ ); ?> Hello
测试效果
# curl -I http://192.168.6.9/index.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 09:41:06 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT Etag: "abcdefg" [[email protected] ~]# curl -H ‘If-None-Match: "abcdefg"‘ -I http://192.168.6.9/index.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 09:41:42 GMT Content-Type: text/html Connection: keep-alive Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT Etag: "abcdefg"
测试情况与之前的Last-Modified结果一样
动态程序返回Etag真的就没有用了吗?
答案是:非也, 有一个方法可以让动态程序返回的 Etag 也能发挥作用,程序修改如下
<?php $etag = md5(‘http://netkiller.github.io‘); cache($etag); function cache($etag) { $http_if_none_match = null; if(array_key_exists (‘HTTP_IF_NONE_MATCH‘,$_SERVER)){ $http_if_none_match = $_SERVER[‘HTTP_IF_NONE_MATCH‘]; } if ($http_if_none_match == $etag) { header(‘Etag: ‘.$etag, true, 304); exit; } else { header(‘Etag: ‘.$etag); } } print_r($_SERVER); echo date("Y-m-d H:i:s"); ?>
首先查看Etag值
# curl -I http://192.168.6.9/test.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 10:07:19 GMT Content-Type: text/html Connection: keep-alive Etag: 7467675324d0f7a3e01ce5151848fedb
发送If-None-Match头
# curl -H ‘If-None-Match: 7467675324d0f7a3e01ce5151848fedb‘ -I http://192.168.6.9/test.php HTTP/1.1 304 Not Modified Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 10:07:39 GMT Connection: keep-alive Etag: 7467675324d0f7a3e01ce5151848fedb
达成预计效果,此种方法同样可以用于 Last-Modified,伪静态后效果更好
Etag 值的运算技巧,我习惯上采用URL同时配合伪静态例如
$etag = $_SERVER[‘REQUEST_URI‘]
URL类似 http://www.example.com/news/100/1000.html 一次请求便缓存页面,这样带来一个更新的问题,于是又做了这样的处理
http://www.example.com/news/100/1000.1.html
.1.是版本号,每次修改后+1操作,.1.没有人格意义rewrite操作是会丢弃这个参数,仅仅是为了始终有新的URL对应内容
4. Expires / Cache-Control
前面所讲 Last-Modified 与 Etag 主要用于分辨文件是否修改过, 无法控制页面在浏览器端缓存的时间。Expires / Cache-Control 可以控制缓存的时间段
Expires 是 HTTP/1.0标准,Cache-Control是 HTTP/1.1标准。都能正常工作,HTTP/1.1规范中max-age优先级高于Expires,有些浏览器会联动设置,例如你设置了Cache-Control随之自动生成Expires,仅仅为了兼容。
4.1. 静态文件
首先配置nginx设置html与png文件缓存1天
location ~ .*\.(html|png)$ { expires 1d; }
当前情况
# curl -I http://192.168.6.9/index.html HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 10:47:08 GMT Content-Type: text/html Content-Length: 6 Last-Modified: Thu, 27 Feb 2014 07:29:50 GMT Connection: keep-alive Accept-Ranges: bytes
重启Nginx后的HTTP协议头多出Expires与Cache-Control
# curl -I http://192.168.6.9/index.html HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 10:42:09 GMT Content-Type: text/html Content-Length: 3698 Last-Modified: Fri, 26 Apr 2013 20:36:51 GMT Connection: keep-alive Expires: Fri, 28 Feb 2014 10:42:09 GMT Cache-Control: max-age=86400 Accept-Ranges: bytes
4.2. 动态文件
默认返回
# curl -I http://192.168.6.9/index.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 11:45:05 GMT Content-Type: text/html Connection: keep-alive
index.php 增加 Cache-Control 输出控制
header(‘Cache-Control: max-age=259200‘);
再次查看
# curl -I http://192.168.6.9/index.php HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Thu, 27 Feb 2014 11:53:48 GMT Content-Type: text/html Connection: keep-alive Cache-Control: max-age=259200
现在使用 Chrome 、Firefox 测试,你会发现始终返回200,并且max-age=259200数值不会改变。
原因是Cache-Control程序输出的,Nginx并不知道,所以Nginx 不会给你返回304
header(‘Last-Modified: ‘ .gmdate(‘D, d M Y H:i:s‘) . ‘ GMT‘ ); $offset = 60 * 60 * 24; header(‘Expires: ‘ . gmdate(‘D, d M Y H:i:s‘, time() + $offset) . ‘ GMT‘); $ttl=3600; header("Cache-Control: max-age=$ttl, must-revalidate");
这种方法不能实现缓存的目的
5. FastCGI 缓存相关
我们做个尝试将 expires 1d;加到location ~ \.php$中,看看能不能实现缓存的目的。
location ~ \.php$ { root /www; fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /www$fastcgi_script_name; include fastcgi_params; expires 1d; }
测试程序
# cat expires.php <?php echo date("Y-m-d H:i:s"); ?>
测试结果
# curl -I http://localhost/expires.php HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 04:39:57 GMT Content-Type: text/html Connection: keep-alive Expires: Sat, 01 Mar 2014 04:39:57 GMT Cache-Control: max-age=86400
虽然推送 Cache-Control: max-age=86400 但是 IE Chrome Firefox 仍不能缓存页面
6. HTML META 与 Cache
创建一个测试文件如下
<html> <head> <title>Hello</title> <meta http-equiv="Cache-Control" content="max-age=7200" /> <meta http-equiv="expires" content="Fri, 28 Feb 2014 12:04:18 GMT" /> </head> <body> Helloworld </body> </html>
测试HTML页面
# curl -i http://localhost/test.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 03:30:45 GMT Content-Type: text/html Transfer-Encoding: chunked Connection: keep-alive <html> <head> <title>Hello</title> <meta http-equiv="Cache-Control" content="max-age=7200" /> <meta http-equiv="expires" content="Fri, 28 Feb 2014 12:04:18 GMT" /> </head> <body> Helloworld </body> </html>
我们可以看到HTML页面中meta设置缓存对Nginx并不起作用, 很多人会说对浏览器起作用!
这次我测试了 IE11, Chrome, Firefox 发现都无法缓存页面,可能对IE5什么的还有用,我没有环境测试,因为10年前我们在B/S开发经常这样使用
<meta http-equiv="cache-control" content="max-age=0" /> <meta http-equiv="cache-control" content="no-cache" /> <meta http-equiv="expires" content="0" /> <meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" /> <meta http-equiv="pragma" content="no-cache" />
至少在当年IE是认这些Meta的,进入HTML5时代很多都发生了变化,所以不能一概而论
7. gzip
defalte 是 Apache httpd 的标准这里只谈gzip
首先创建一个 gzip.html
# curl -I http://localhost/gzip.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Mon, 03 Mar 2014 01:49:45 GMT Content-Type: text/html Content-Length: 19644 Last-Modified: Mon, 03 Mar 2014 01:49:02 GMT Connection: keep-alive ETag: "5313df8e-4cbc" Accept-Ranges: bytes
开启 gzip on;
server { listen 80; server_name localhost; #charset utf-8; #access_log /var/log/nginx/log/host.access.log main; #etag on; #ssi on; gzip on;
现在看看效果
# curl -I http://localhost/gzip.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Mon, 03 Mar 2014 01:51:56 GMT Content-Type: text/html Content-Length: 19644 Last-Modified: Mon, 03 Mar 2014 01:49:02 GMT Connection: keep-alive ETag: "5313df8e-4cbc" Accept-Ranges: bytes
并没有什么不同,现在增加HTTP头Accept-Encoding:gzip,defalte看看
# curl -H Accept-Encoding:gzip,defalte http://localhost/gzip.html
如果你能看到非文本内容(俗称乱码)就表示成功了。输入内容就是gzip压缩后二进制数据,我们使用gunzip可以解压缩
# curl -H Accept-Encoding:gzip,defalte http://localhost/gzip.html | gunzip
如果能正常看到html输出,表示压缩无误。
7.1. gzip 总结
gzip on; 开启后默认支持 text/html 不能在 gzip_types 再次定义,否则会提示重复MIME类型
Starting nginx: nginx: [warn] duplicate MIME type "text/html" in /etc/nginx/conf.d/localhost.conf:16
高级配置参考
gzip on; gzip_http_version 1.0; gzip_types text/plain text/xml text/css application/xml application/xhtml+xml application/rss+xml application/atom_xml application/javascript application/x-javascript application/json; gzip_disable "MSIE [1-6]\."; gzip_disable "Mozilla/4"; gzip_comp_level 6; gzip_proxied any; gzip_vary on; gzip_buffers 4 8k; gzip_min_length 1000;
8. 反向代理与缓存
反向代理服务器缓存方式分为:
强制缓存,指定文件,扩展名,URL设置缓存时间
遵循HTTP协议头标准进行缓存
默认配置,只进行代理,不进行缓存
server { listen 80; server_name 192.168.2.15; #access_log /var/log/nginx/log/host.access.log main; location / { proxy_pass http://localhost:80; proxy_set_header X-Real-IP $remote_addr; } }
反向代理会产生两条日志(access_log 写入一个文件中,如果分开写,则会分开写入日志)
192.168.2.15 - - [28/Feb/2014:18:09:33 +0800] "HEAD /modified.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" 127.0.0.1 - - [28/Feb/2014:18:09:33 +0800] "HEAD /modified.html HTTP/1.0" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
Last-Modified 与 ETag 会透传过去
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 12:04:18 GMT" -I http://192.168.2.15/modified.html HTTP/1.1 304 Not Modified Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 10:17:30 GMT Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT
我们可以看到两条日志都返回304
192.168.2.15 - - [28/Feb/2014:18:17:30 +0800] "HEAD /modified.html HTTP/1.1" 304 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" 127.0.0.1 - - [28/Feb/2014:18:17:30 +0800] "HEAD /modified.html HTTP/1.0" 304 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
下面为反向代理增加缓存功能
proxy_temp_path /tmp/proxy_temp_dir; proxy_cache_path /tmp/proxy_cache_dir levels=1:2 keys_zone=nginx_cache:200m inactive=3d max_size=30g; server { listen 80; server_name 192.168.2.15; location / { proxy_cache nginx_cache; proxy_cache_key $host$uri$is_args$args; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_cache_valid 200 10m; proxy_pass http://localhost; } location ~ .*\.(php|jsp|cgi)?$ { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass http://backend_server; } }
# curl -I http://192.168.2.15/index.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 10:57:35 GMT Content-Type: text/html Content-Length: 12 Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 06:54:45 GMT ETag: "531032b5-c" Expires: Sat, 01 Mar 2014 10:57:35 GMT Cache-Control: max-age=86400 Accept-Ranges: bytes # curl -I http://192.168.2.15/index.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 10:57:41 GMT Content-Type: text/html Content-Length: 12 Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 06:54:45 GMT ETag: "531032b5-c" Expires: Sat, 01 Mar 2014 10:57:35 GMT Cache-Control: max-age=86400 Accept-Ranges: bytes # curl -I http://192.168.2.15/index.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Fri, 28 Feb 2014 10:57:46 GMT Content-Type: text/html Content-Length: 12 Connection: keep-alive Last-Modified: Fri, 28 Feb 2014 06:54:45 GMT ETag: "531032b5-c" Expires: Sat, 01 Mar 2014 10:57:35 GMT Cache-Control: max-age=86400 Accept-Ranges: bytes
上面共请求了3次服务器
192.168.2.15 - - [28/Feb/2014:18:57:35 +0800] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" 127.0.0.1 - - [28/Feb/2014:18:57:35 +0800] "GET /index.html HTTP/1.0" 200 12 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "192.168.2.15" 192.168.2.15 - - [28/Feb/2014:18:57:41 +0800] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-" 192.168.2.15 - - [28/Feb/2014:18:57:46 +0800] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
第一次连接192.168.2.15然后转发给127.0.0.1 返回 HTTP/1.1 200 OK
后面两次连接192.168.2.15没有转发给127.0.0.1 直接返回 HTTP/1.1 200 OK
查看缓存目录,我们可以看到生成的缓存文件
# find /tmp/proxy_* /tmp/proxy_cache_dir /tmp/proxy_cache_dir/1 /tmp/proxy_cache_dir/1/79 /tmp/proxy_cache_dir/1/79/b47a0009c531900de2a15ba80c0e3791 /tmp/proxy_temp_dir
8.1. gzip 处理
http://localhost/gzip.html 是支持压缩的,192.168.2.15 proxy_pass http://localhost
# curl -H Accept-Encoding:gzip,defalte http://localhost/gzip.html
运行后输出乱码
# curl -H Accept-Encoding:gzip,defalte http://192.168.2.15/gzip.html
现在透过反向代理请求试试,你会发现gzip压缩无效,输出的是HTML,这是怎么回事呢?这是因为反向代理不清楚后面的服务器是否支持gzip,所以一律按照正常html请求。现在我们开启 gzip_vary on; 每次返回数据会携带Vary: Accept-Encoding 头。
gzip on; gzip_vary on;
reload nginx 后查看Vary: Accept-Encoding输出
# curl -I http://localhost/gzip.html HTTP/1.1 200 OK Server: nginx/1.4.5 Date: Mon, 03 Mar 2014 02:09:16 GMT Content-Type: text/html Content-Length: 19644 Last-Modified: Mon, 03 Mar 2014 01:49:02 GMT Connection: keep-alive Vary: Accept-Encoding ETag: "5313df8e-4cbc" Accept-Ranges: bytes
有 Vary: Accept-Encoding 头,现在再测试一次
# curl -H "Accept-Encoding: gzip" http://192.168.2.15/gzip.html <html> <head> <title>Hello</title>
测试失败,并没有出现预期效果,于是到网站找答案,中文与英文资料都看个遍,没有解决.
最后只能让反向代理取到数据后再压缩一次,配置开启 gzip on;
proxy_temp_path /tmp/proxy_temp_dir; proxy_cache_path /tmp/proxy_cache_dir levels=1:2 keys_zone=nginx_cache:200m inactive=3d max_size=30g; server { listen 80; server_name 192.168.2.15; gzip on; location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # proxy_set_header Accept-Encoding "gzip"; 没有任何效果 proxy_pass http://localhost; } }
Nginx 反向代理作为代理绰绰有余,如果做缓存服务器,还是使用squid, varnish吧。
9. 特殊数据缓存
缓存并非只能缓存静态内容,HTML,CSS,JS以及图片意外的数据一样可以缓存。
只要处理好HTTP头即可。例如Ajax动态内容缓存,JSON数据缓存。
9.1. json
当用户请求json地址时,我们将 json 数据附加HTTP头(Cache-Control, Expires, ETag),然后返回给用户,用户的设备会遵循HTTP的声明,进行缓存操作。
curl -I http://api.example.com/article/json/2/20/0.html HTTP/1.1 200 OK Expires: Wed, 26 Aug 2015 05:40:57 GMT Date: Wed, 26 Aug 2015 05:39:57 GMT Server: nginx Content-Type: application/json; charset=utf-8 Transfer-Encoding: chunked Cache-Control: max-age=60 ETag: 4238111283 Age: 69475 X-Via: 1.1 kaifeng45:3 (Cdn Cache Server V2.0) Connection: keep-alive
注意这里使用了伪静态 /article/json/2/20/0.html 伪静态与缓存没有关系,实际起作用的是HTTP头。
我们可以看到 Content-Type: application/json; charset=utf-8 声明,表明这是json数据,而不是HTML。
9.2. XML
这里是指动态生成的XML,处理方式与 JSON一样,XML数据附加HTTP头(Cache-Control, Expires, ETag)后返回给用户。
10. 总结
经过详细的测试我们发现不同的浏览器,不同的Web服务器,甚至每个版本都有所差异。
测试总结 Apache HTTPD 最完善 Lighttpd 其次, Nignx仍在快速发展中,Nignx每个版本差异很大,对HTTP协议实现标准也不太严谨,因为Nignx在大陆是趋势,所以下面给出的例子都是nginx
我比较看好Lighttpd,FastCGI 部分我一般是用php-fpm替代Lighttpd的spawn-fcgi
切记使用Nginx要注意每个本版细微变化,否则升级后会有影响。我习惯使用yum 安装 nginx 随时 yum update 升级。
另外FastCGI 与 mod_php也有所区别
延伸阅读《 Netkiller Web 手札》http://netkiller.github.io/www/index.html