1、创建ns_g_socketmgr:
首先,套接字管理器是全局唯一的,与有多少个网络接口无关,全局变量定义在/bin/named/include/named/globals.h:
EXTERN isc_socketmgr_t * ns_g_socketmgr INIT(NULL);
#0 isc__socketmgr_create2 (mctx=0x8742d0, managerp=0x8701f8, maxsocks=0) at socket.c:4143
#1 0x000000000041919e in create_managers () at ./main.c:604
#2 0x0000000000419727 in setup () at ./main.c:850
#3 0x0000000000419a2b in main (argc=4, argv=0x7fffffffe5c8) at ./main.c:1058
使用多线程时,isc__socketmgr_create2会创建管道、select\epoll线程,工作线程通过管道控制select\epoll线程的工作。
2、扫描网络接口:
bind9启动时会扫描一下网络接口,运行期间会定时扫描,扫描间隔可以设置相应定时器,这样网络环境发生变化,bind9可以及时感知。bind9会为每一个网络接口创建两个监听套接字,为lo网络接口创建控制套接字。所以只有一个物理网卡的机器,在启动时会创建3个套接字。
udp监听套接字:
#0 isc__socket_create (manager0=0x7ffff7fa9010, pf=2, type=isc_sockettype_udp, socketp=0x7fffec7870c8) at socket.c:2580 #1 0x00000000004861cc in open_socket (mgr=0x7ffff7fa9010, local=0x7ffff7fbe290, options=1, sockp=0x7fffec7872f8) at dispatch.c:1797 #2 0x0000000000489b27 in get_udpsocket (mgr=0x7ffff7fae270, sockmgr=0x7ffff7fa9010, taskmgr=0x7ffff7fa5010, localaddr=0x7ffff7fbe290, maxrequests=<value optimized out>, attributes=44, dispp=0x7fffec787418) at dispatch.c:2792 #3 dispatch_createudp (mgr=0x7ffff7fae270, sockmgr=0x7ffff7fa9010, taskmgr=0x7ffff7fa5010, localaddr=0x7ffff7fbe290, maxrequests=<value optimized out>, attributes=44, dispp=0x7fffec787418) at dispatch.c:2860 #4 0x000000000048a042 in dns_dispatch_getudp (mgr=0x7ffff7fae270, sockmgr=0x7ffff7fa9010, taskmgr=0x7ffff7fa5010, localaddr=0x7ffff7fbe290, buffersize=<value optimized out>, maxbuffers=<value optimized out>, maxrequests=32768, buckets=8219, increment=8237, attributes=44, mask=30, dispp=0x7ffff7fbe340) at dispatch.c:2714 #5 0x000000000041520b in ns_interface_listenudp (ifp=0x7ffff7fbe250) at interfacemgr.c:261 #6 0x00000000004155e5 in ns_interface_setup (mgr=0x7ffff7fb6f70, addr=0x7fffec787700, name=0x7fffec787570 "eth0", ifpret=0x7fffec787878, accept_tcp=isc_boolean_true) at interfacemgr.c:365 #7 0x0000000000416a16 in do_scan (mgr=0x7ffff7fb6f70, ext_listen=0x0, verbose=isc_boolean_true) at interfacemgr.c:844 #8 0x0000000000416bf2 in ns_interfacemgr_scan0 (mgr=0x7ffff7fb6f70, ext_listen=0x0, verbose=isc_boolean_true) at interfacemgr.c:897 #9 0x0000000000416c92 in ns_interfacemgr_scan (mgr=0x7ffff7fb6f70, verbose=isc_boolean_true) at interfacemgr.c:923 #10 0x0000000000435107 in scan_interfaces (server=0x7ffff7fae010, verbose=isc_boolean_true) at server.c:3604 #11 0x0000000000437d60 in load_configuration (filename=0x7fffffffe850 "/var/named/named.conf", server=0x7ffff7fae010, first_time=isc_boolean_true) at server.c:4638 #12 0x0000000000439fdf in run_server (task=0x7ffff7fba010, event=0x0) at server.c:5268 #13 0x00000000005b3b15 in dispatch (manager=0x7ffff7fa5010) at task.c:1012 #14 0x00000000005b3da1 in run (uap=0x7ffff7fa5010) at task.c:1157 #15 0x0000003817a07a51 in start_thread () from /lib64/libpthread.so.0 #16 0x00000038176e896d in clone () from /lib64/libc.so.6
tcp套接字:
#0 isc__socket_create (manager0=0x7ffff7fa9010, pf=2, type=isc_sockettype_tcp, socketp=0x7ffff7fbe348) at socket.c:2580 #1 0x0000000000415344 in ns_interface_accepttcp (ifp=0x7ffff7fbe250) at interfacemgr.c:297 #2 0x0000000000415600 in ns_interface_setup (mgr=0x7ffff7fb6f70, addr=0x7fffec787700, name=0x7fffec787570 "eth0", ifpret=0x7fffec787878, accept_tcp=isc_boolean_true) at interfacemgr.c:370 #3 0x0000000000416a16 in do_scan (mgr=0x7ffff7fb6f70, ext_listen=0x0, verbose=isc_boolean_true) at interfacemgr.c:844 #4 0x0000000000416bf2 in ns_interfacemgr_scan0 (mgr=0x7ffff7fb6f70, ext_listen=0x0, verbose=isc_boolean_true) at interfacemgr.c:897 #5 0x0000000000416c92 in ns_interfacemgr_scan (mgr=0x7ffff7fb6f70, verbose=isc_boolean_true) at interfacemgr.c:923 #6 0x0000000000435107 in scan_interfaces (server=0x7ffff7fae010, verbose=isc_boolean_true) at server.c:3604 #7 0x0000000000437d60 in load_configuration (filename=0x7fffffffe850 "/var/named/named.conf", server=0x7ffff7fae010, first_time=isc_boolean_true) at server.c:4638 #8 0x0000000000439fdf in run_server (task=0x7ffff7fba010, event=0x0) at server.c:5268 #9 0x00000000005b3b15 in dispatch (manager=0x7ffff7fa5010) at task.c:1012 #10 0x00000000005b3da1 in run (uap=0x7ffff7fa5010) at task.c:1157 #11 0x0000003817a07a51 in start_thread () from /lib64/libpthread.so.0 #12 0x00000038176e896d in clone () from /lib64/libc.so.6
rndc控制套接字:
#0 isc__socket_create (manager0=0x7ffff7fa9010, pf=2, type=isc_sockettype_tcp, socketp=0x7fffea8431a8) at socket.c:2580 #1 0x0000000000413a62 in add_listener (cp=0x7ffff7faf038, listenerp=0x7fffec787aa0, control=0x7ffff7fcfb38, config=0x7ffff7fcf550, addr=0x7fffec787990, aclconfctx=0x7ffff7fa3070, socktext=0x7fffec787a40 "0.0.0.0#953", type=isc_sockettype_tcp) at controlconf.c:1145 #2 0x0000000000413fd2 in ns_controls_configure (cp=0x7ffff7faf038, config=0x7ffff7fcf550, aclconfctx=0x7ffff7fa3070) at controlconf.c:1281 #3 0x0000000000438916 in load_configuration (filename=0x7fffffffe850 "/var/named/named.conf", server=0x7ffff7fae010, first_time=isc_boolean_true) at server.c:4862 #4 0x0000000000439fdf in run_server (task=0x7ffff7fba010, event=0x0) at server.c:5268 #5 0x00000000005b3b15 in dispatch (manager=0x7ffff7fa5010) at task.c:1012 #6 0x00000000005b3da1 in run (uap=0x7ffff7fa5010) at task.c:1157 #7 0x0000003817a07a51 in start_thread () from /lib64/libpthread.so.0 #8 0x00000038176e896d in clone () from /lib64/libc.so.6
socket在socketmgr中的存储:
sock->manager->fds[sock->fd] = sock;
sock->manager->fdstate[sock->fd] = MANAGED;
3、epoll监听套接字和管道:
几个重点函数:
- watch_fd 给select或epoll添加监听描述符;
- unwatch_fd 去除select或epoll中的监听描述符;
- select_poke 写管道,通知select/epoll监听线程给select或epoll添加监听描述符;
- select_readmsg 读管道,在wakeup_socket函数(调用watch_fd)之前调用;
epoll既可以监听管道,又可以监听套接字。bind9的套接字监听控制管道在管道建立的时候就直接加入到监听列表中了,具体栈过程如下:
#0 watch_fd (manager=0x7ffff7fa9010, fd=9, msg=-3) at socket.c:814 #1 0x00000000005c5947 in setup_watcher (mctx=0x8742d0, manager=0x7ffff7fa9010) at socket.c:3973 #2 0x00000000005c5f2a in isc__socketmgr_create2 (mctx=0x8742d0, managerp=0x8701f8, maxsocks=4096) at socket.c:4246 #3 0x000000000041919e in create_managers () at ./main.c:604 #4 0x0000000000419727 in setup () at ./main.c:850 #5 0x0000000000419a2b in main (argc=4, argv=0x7fffffffe5c8) at ./main.c:1058
如果有需要监听的套接字,可以通过写上面的管道, 使用管道可以避免线程同步的麻烦。
#0 select_poke (mgr=0x7ffff7fa9010, fd=512, msg=-3) at socket.c:1026 #1 0x00000000005c68dd in socket_recv (sock=0x7ffff7fd6010, dev=0x7fffeaed8148, task=0x7ffff7fba9b0, flags=0) at socket.c:4486 #2 0x00000000005c6f61 in isc__socket_recv2 (sock0=0x7ffff7fd6010, region=0x7ffff0f90d10, minimum=1, task=0x7ffff7fba9b0, event=0x7fffeaed8148, flags=0) at socket.c:4619 #3 0x000000000040cf58 in client_udprecv (client=0x7fffe4004c40) at client.c:2359 #4 0x000000000040877a in client_start (task=0x7ffff7fba9b0, event=0x7fffe40050e8) at client.c:583 #5 0x00000000005b3b15 in dispatch (manager=0x7ffff7fa5010) at task.c:1012 #6 0x00000000005b3da1 in run (uap=0x7ffff7fa5010) at task.c:1157 #7 0x0000003817a07a51 in start_thread () from /lib64/libpthread.so.0 #8 0x00000038176e896d in clone () from /lib64/libc.so.6
在socket_recv函数中有这样的代码:
/* * Enqueue the request. If the socket was previously not being * watched, poke the watcher to start paying attention to it. */ if (ISC_LIST_EMPTY(sock->recv_list) && !sock->pending_recv) select_poke(sock->manager, sock->fd, SELECT_POKE_READ); ISC_LIST_ENQUEUE(sock->recv_list, dev, ev_link);
用于把client的按读事件的调度方式转化为epoll按文件描述符的调度方式(一个套接字可以有很多的读事件)。
在internal_recv()(internal_recv函数后面会讲到)函数中有如下代码:
poke: if (!ISC_LIST_EMPTY(sock->recv_list)) select_poke(sock->manager, sock->fd, SELECT_POKE_READ);
通过这两处的写管道配合,即使没有用锁,也可以完美线程同步。
写管道之后,watcher线程再读管道,具体栈过程如下:
#0 watch_fd (manager=0x7ffff7fa9010, fd=512, msg=-3) at socket.c:795 #1 0x00000000005bf14b in wakeup_socket (manager=0x7ffff7fa9010, fd=512, msg=-3) at socket.c:996 #2 0x00000000005c554e in process_ctlfd (manager=0x7ffff7fa9010) at socket.c:3761 #3 0x00000000005c549f in process_fds (manager=0x7ffff7fa9010, events=0x7ffff7faa010, nevents=1) at socket.c:3665 #4 0x00000000005c5696 in watcher (uap=0x7ffff7fa9010) at socket.c:3872 #5 0x0000003817a07a51 in start_thread () from /lib64/libpthread.so.0 #6 0x00000038176e896d in clone () from /lib64/libc.so.6
select_readmsg会在watch_fd函数之前调用,用于读管道。
select/epoll监听线程(watcher函数)是一个快速精悍线程,也就是说判断到可读可写状态后的读写操作不是在此函数完成的,所以在epoll之后要及时把相关套接字从epoll中监听列表中剔除(调用unwatch_fd函数),只有当实际接受函数完成或等待读事件耗尽才会再次加进去。从列表中剔除的同时发送读事件,task调度线程会通知实际的读函数去完成读任务。
socket_recv、dispatch_recv、internal_recv三个函数的关系:
- socket_recv读套接字,如果暂时没有内容,把读事件加入套接字的读事件列表,有必要的话还把套接字加入epoll监听列表;
- dispatch_recv由epoll监听线程调用,但它并不真正执行读任务,而是通过发送读时间通知task调用internal_recv;
- internal_recv显然就是那个干苦力的;