The SO_REUSEPORT socket option

One of the features merged in the 3.9 development cycle was TCP and UDP support for the SO_REUSEPORTsocket option; that support was implemented in a series of patches by Tom Herbert. The new socket option allows multiple sockets on the same host to bind to the same port, and is intended to improve the performance of multithreaded network server applications running on top of multicore systems.

The basic concept of SO_REUSEPORT is simple enough. Multiple servers (processes or threads) can bind to the same port if they each set the option as follows:

    int sfd = socket(domain, socktype, 0);

    int optval = 1;
    setsockopt(sfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));

    bind(sfd, (struct sockaddr *) &addr, addrlen);

So long as the first server sets this option before binding its socket, then any number of other servers can also bind to the same port if they also set the option beforehand. The requirement that the first server must specify this option prevents port hijacking—the possibility that a rogue application binds to a port already used by an existing server in order to capture (some of) its incoming connections or datagrams. To prevent unwanted processes from hijacking a port that has already been bound by a server using SO_REUSEPORT, all of the servers that later bind to that port must have an effective user ID that matches the effective user ID used to perform the first bind on the socket.

SO_REUSEPORT can be used with both TCP and UDP sockets. With TCP sockets, it allows multiple listening sockets—normally each in a different thread—to be bound to the same port. Each thread can then accept incoming connections on the port by calling accept(). This presents an alternative to the traditional approaches used by multithreaded servers that accept incoming connections on a single socket.

The first of the traditional approaches is to have a single listener thread that accepts all incoming connections and then passes these off to other threads for processing. The problem with this approach is that the listening thread can become a bottleneck in extreme cases. Inearly discussions on SO_REUSEPORT, Tom noted that he was dealing with applications that accepted 40,000 connections per second. Given that sort of number, it‘s unsurprising to learn that Tom works at Google.

The second of the traditional approaches used by multithreaded servers operating on a single port is to have all of the threads (or processes) perform an accept() call on a single listening socket in a simple event loop of the form:

    while (1) {
        new_fd = accept(...);
        process_connection(new_fd);
    }

The problem with this technique, as Tom pointed out, is that when multiple threads are waiting in the accept() call, wake-ups are not fair, so that, under high load, incoming connections may be distributed across threads in a very unbalanced fashion. At Google, they have seen a factor-of-three difference between the thread accepting the most connections and the thread accepting the fewest connections; that sort of imbalance can lead to underutilization of CPU cores. By contrast, the SO_REUSEPORT implementation distributes connections evenly across all of the threads (or processes) that are blocked in accept() on the same port.

As with TCP, SO_REUSEPORT allows multiple UDP sockets to be bound to the same port. This facility could, for example, be useful in a DNS server operating over UDP. With SO_REUSEPORT, each thread could use recv() on its own socket to accept datagrams arriving on the port. The traditional approach is that all threads would compete to perform recv() calls on a single shared socket. As with the second of the traditional TCP scenarios described above, this can lead to unbalanced loads across the threads. By contrast, SO_REUSEPORTdistributes datagrams evenly across all of the receiving threads.

Tom noted that the traditional SO_REUSEADDR socket option already allows multiple UDP sockets to be bound to, and accept datagrams on, the same UDP port. However, by contrast with SO_REUSEPORTSO_REUSEADDR does not prevent port hijacking and does not distribute datagrams evenly across the receiving threads.

There are two other noteworthy points about Tom‘s patches. The first of these is a useful aspect of the implementation. Incoming connections and datagrams are distributed to the server sockets using a hash based on the 4-tuple of the connection—that is, the peer IP address and port plus the local IP address and port. This means, for example, that if a client uses the same socket to send a series of datagrams to the server port, then those datagrams will all be directed to the same receiving server (as long as it continues to exist). This eases the task of conducting stateful conversations between the client and server.

The other noteworthy point is that there is a defect in the current implementation of TCP SO_REUSEPORT. If the number of listening sockets bound to a port changes because new servers are started or existing servers terminate, it is possible that incoming connections can be dropped during the three-way handshake. The problem is that connection requests are tied to a specific listening socket when the initial SYN packet is received during the handshake. If the number of servers bound to the port changes, then the SO_REUSEPORT logic might not route the final ACK of the handshake to the correct listening socket. In this case, the client connection will be reset, and the server is left with an orphaned request structure. A solution to the problem is still being worked on, and may consist of implementing a connection request table that can be shared among multiple listening sockets.

The SO_REUSEPORT option is non-standard, but available in a similar form on a number of other UNIX systems (notably, the BSDs, where the idea originated). It seems to offer a useful alternative for squeezing the maximum performance out of network applications running on multicore systems, and thus is likely to be a welcome addition for some application developers.

socket层分流思想:监听同一端口的场景下,所有线程都拥有一个独立的socket fd,而不是共用一个,从而提高性能!这也是引入SO_REUSEPORT socket option的原因

时间: 2024-10-10 19:03:42

The SO_REUSEPORT socket option的相关文章

Java Socket Option

选项 public final static int TCP_NODELAY = 0x0001; public final static int SO_REUSEADDR = 0x04; public final static int SO_LINGER = 0x0080; public final static int SO_TIMEOUT = 0x1006; public final static int SO_SNDBUF = 0x1001; public final static int

Nginx Announcing NGINX Plus R7

https://www.nginx.com/blog/nginx-plus-r7-released/?_ga=1.70204696.981062698.1445605275 https://www.nginx.com/ Announcing NGINX Plus R7 NGINX, Inc. is proud to announce the availability of NGINX Plus Release 7 (R7), the latest release of our applicati

socket常见选项之SO_REUSEADDR,SO_REUSEPORT

目录 SO_REUSEADDR time-wait SO_REUSEPORT SO_REUSEADDR 一般来说,一个端口释放后会等待两分钟之后才能再被使用,SO_REUSEADDR是让端口释放后立即就可以被再次使用 SO_REUSEADDR用于对TCP套接字处于TIME_WAIT状态下的socket,才可以重复绑定使用 server程序总是应该在调用bind()之前设置SO_REUSEADDR套接字选项 TCP,先调用close()的一方会进入TIME_WAIT状态 SO_REUSEADDR提

SO_REUSEPORT和SO_REUSEADDR与socket编程中那些关于内核自动分配的...

前言: 本文分为三个章节,第一个章节主要是翻译总结汇总一位国外的老兄在Stack Overflow上的回答,但实际上Linux发展这么多年,文中的知识点已经过时且不准确了, 在第二章中通过实验,有更加准确的描述.但是,第一章节也不是全然无用,至少在了解SO_REUSEPORT和SO_REUSEADDR的发展上是有帮助的. 在第三章节中,做实验过程中需要验证一些其他的知识点,因此在这里做一个汇总. wxy:其实就是我研究完才发现文章写的不对,又不想浪费自己的研究成果,哈哈哈哈哈,hianghian

linux socket中的SO_REUSEADDR

Welcome to the wonderful world of portability... or rather the lack of it. Before we start analyzing these two options in detail and take a deeper look how different operating systems handle them, it should be noted that the BSD socket implementation

Linux 最新SO_REUSEPORT特性

1.前言 昨天总结了一下Linux下网络编程“惊群”现象,给出Nginx处理惊群的方法,使用互斥锁.为例发挥多核的优势,目前常见的网络编程模型就是多进程或多线程,根据accpet的位置,分为如下场景: (1)单进程或线程创建socket,并进行listen和accept,接收到连接后创建进程和线程处理连接 (2)单进程或线程创建socket,并进行listen,预先创建好多个工作进程或线程accept()在同一个服务器套接字.                        这两种模型解充分发挥了

SO_REUSEPORT学习笔记

SO_REUSEPORT学习笔记 时间 2015-02-12 16:50:00 BlogJava-技术区 原文  http://www.blogjava.net/yongboy/archive/2015/02/12/422893.html 主题 Socket 前言 本篇用于记录学习SO_REUSEPORT的笔记和心得,末尾还会提供一个bindp小工具也能为已有的程序享受这个新的特性. 当前Linux网络应用程序问题 运行在Linux系统上网络应用程序,为了利用多核的优势,一般使用以下比较典型的多

能否不同udp socket绑定到同一IP地址和port

http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-4.html https://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/b8a3395f-938b-4347-b6c3-431851e68884/sockets-multicastbroadcast-udp-tcp-soreuseaddr-soreusep

BSD socket API

伯克利套接字(Berkeley sockets),也称为BSD Socket.伯克利套接字的应用编程接口(API)是采用C语言的进程间通信的库,经常用在计算机网络间的通信. BSD Socket的应用编程接口已经是网络套接字的抽象标准.大多数其他程序语言使用一种相似的编程接口.它最初是由加州伯克利大学为Unix系统开发出来的.所有现代的操作系统都实现了伯克利套接字接口,因为它已经是连接互联网的标准接口了. API函数 这些是伯克利套接字提供的库函数. socket() 创造某种类型的套接字,分配