In previous post Let‘s do our own full blown HTTP server with Netty 4 you and I were excited by creation of our own web server. So far so good. But how good?
Given ordinary notebook
cat /proc/cpuinfo | grep model\ name model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz cat /proc/meminfo | grep MemTotal MemTotal: 3956836 kB
(Ok, there are only 4 cores with hyperthreading.)
We have following numbers
build/default/weighttp -n 1000000 -k -c 100 http://localhost:9999/ weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 100 concurrent requests, 1000000 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done finished in 20 sec, 195 millisec and 236 microsec, 49516 req/s, 7251 kbyte/s requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 149967800 bytes total, 145967800 bytes http, 4000000 bytes data
49516 req/s. Is this good? I‘d say very good. We need some baseline to compare to. As such baseline we‘ll use nginx serving small static file. ‘Why serving a file is faster than serving a piece of memory?‘ you ask. With sendfile turned on nginx after response header sent just instructs kernel to send file to socket, file after first hit is in linux buffer cache, thus kernel sends a page from memory to Ethernet card. Zero-copy. If you are about to send dynamically generated response from memory kernel needs to copy this piece of data from userspace to kernelspace. Thus example below should be fast.
Nginx config
worker_processes 4; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; server { listen 6666; server_name localhost; location / { root html; index index.html index.htm; } } }
The same test
build/default/weighttp -n 1000000 -k -c 100 http://localhost:6666/ weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 100 concurrent requests, 1000000 total requests ... progress: 100% done finished in 11 sec, 402 millisec and 913 microsec, 87696 req/s, 72705 kbyte/s requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 848950295 bytes total, 236950295 bytes http, 612000000 bytes data
87696 req/s. Very well, thus we are not so far behind.
Lets draw other baseline. Python is considered slow. But it is not as slow as you think. Add to your nginx config
http { upstream test { server unix:///home/adolgarev/uwsgi.sock; } ... server { location /test { uwsgi_pass test; include uwsgi_params; } ... } }
uwsgi protocol is far more effective than http thus we are using uwsgi_pass instead of proxy_pass.
Start uwsgi with small WSGI test program
cat test.py def application(env, start_response): start_response(‘200 OK‘, [(‘Content-Type‘,‘text/html‘)]) return ["Hello World"] uwsgi --socket uwsgi.sock --wsgi-file test.py --master --processes 4 --threads 2
The same test
build/default/weighttp -n 100000 -k -c 100 http://localhost:6666/test weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 100 concurrent requests, 100000 total requests ... progress: 100% done finished in 9 sec, 264 millisec and 26 microsec, 10794 req/s, 1844 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 17495280 bytes total, 15395280 bytes http, 2100000 bytes data
10794 req/s, I‘d say good enough as for python.
Another thing to compare to is GlassFish Server Open Source Edition 4.0 with Servlets 3.1 powered by Grizzly - a competitor of Netty.
Small Servlet
package test; import java.io.IOException; import java.io.PrintWriter; import javax.json.Json; import javax.json.stream.JsonGenerator; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; @WebServlet("/test") public class TestServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { resp.setContentType("text/plain"); PrintWriter writer = resp.getWriter(); try (JsonGenerator gen = Json.createGenerator(writer)) { gen.writeStartObject().write("res", "Ok").writeEnd(); } } }
And the same test
build/default/weighttp -n 1000000 -k -c 100 http://localhost:8080/WebApplication1/test weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 100 concurrent requests, 1000000 total requests ... progress: 100% done finished in 41 sec, 745 millisec and 917 microsec, 23954 req/s, 6855 kbyte/s requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 293074157 bytes total, 281074157 bytes http, 12000000 bytes data
23954 req/s, somewhere in between. But surprisingly if we remove json dependency it runs hell fast.
@WebServlet("/test") public class TestServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { resp.setContentType("text/plain"); try (PrintWriter writer = resp.getWriter()) { writer.print("Ok"); } } }
build/default/weighttp -n 1000000 -k -c 100 http://localhost:8080/WebApplication1/test weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 100 concurrent requests, 1000000 total requests ... progress: 100% done finished in 10 sec, 944 millisec and 231 microsec, 91372 req/s, 25169 kbyte/s requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 282074100 bytes total, 280074100 bytes http, 2000000 bytes data
Amazing 91372 req/s, faster than nginx (due to the whole response is small enough to fit into one memory page and even one packet with MTU 1500).
I‘m still not satisfied. Profile our server.
Eliminate most obvious hot spots.
build/default/weighttp -n 1000000 -k -c 100 http://localhost:9999/ weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 100 concurrent requests, 1000000 total requests ... progress: 100% done finished in 9 sec, 298 millisec and 269 microsec, 107546 req/s, 10292 kbyte/s requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 98000000 bytes total, 94000000 bytes http, 4000000 bytes data
107546 req/s, and we can do even better.
Conclusion? Our server is fast. But Servlets are fast too, taking into account asynchronous support (Servlets 3.0) and nonblocking I/O (Servlets 3.1) they are good choice for http. Our server has obvious advantage - one can change any piece of pipeline and even add support for various protocols.
Performance analysis of our own full blown HTTP