以linux-2.6.21为例.
数据结构介绍:
ip_vs_conn
对于某个连接记录其N元组, (client, vserver, rserver) & (address, port)
Q: ip_vs_conn?
A: 在选择rserver的时候,通过scheduler函数来创建rserver并,创建对应的ip_vs_conn,并保存在ip_vs_conn_tab数组中.详见函数ip_vs_schedule ipv4/ipvs/ip_vs_core.c
ip_vs_in->conn_schedule->tcp_conn_schedule->ip_vs_schedule->ip_vs_conn_new
Q: 此连接何时过期?
1. 在检查到rserver状态不为IP_VS_DEST_F_AVAILABLE,则调用ip_vs_conn_expire_now
ip_vs_service
代表了一个virtual server,由链表ip_vs_svc_table统一维护,即所有的vritual server都会在保存在ip_vs_svc_table中.ip_vs_svc_table的大小也限制了virtual server的大小. 2.6.21版本中为256个.
Q: 何时建立?
A: 在通过用户态命令创建virtual server的时候会创建,详见ip_vs_add_service,相关文件net/ipv4/ipvs/ip_vs_ctl.c
ip_vs_core.c
定义了ip_vs_init函数作为模块初始化的方法
此初始化方法主要做了如下几件事情:
1. ip_vs_control_init 使用nf_register_sockopt注册内核态数据ip_vs_sockopts结构,用来与用户态ipvsadm命令交互
注:和netlink一样,sockopt是内核态与用户态通信的一种方式,详见http://blog.csdn.net/jk110333/article/details/8642261
相关文件:net/ipv4/ipvs/ip_vs_ctl.c
2. ip_vs_protocol_init
此功能主要注册ip_vs_protocol_tcp, ip_vs_protocol_udp, ip_vs_protocol_ah, ip_vs_protocol_esp。注册这些协议的目的是为了使用ip_vs_protocol结构定义,在对支持的协议做lvs相关处理的时候(比如snat,dnat等)时应该调用哪种方法。相关记录在ip_vs_proto_table数组中.
相关文件:include/net/ip_vs.h、net/ipv4/ipvs/ip_vs_proto.c
3. ip_vs_conn_init
分配连接hash表并初始化list_head
相关文件:net/ipv4/ipvs/ip_vs_conn.c
4. 注册hook钩子,以使用netfiler框架调用lvs相关处理方法. 主要有:
/* After packet filtering, forward packet through VS/DR, VS/TUN, or VS/NAT(change destination), so that filtering rules can be applied to IPVS. */ static struct nf_hook_ops ip_vs_in_ops = { .hook = ip_vs_in, .owner = THIS_MODULE, .pf = PF_INET, .hooknum = NF_IP_LOCAL_IN, .priority = 100, }; /* After packet filtering, change source only for VS/NAT */ static struct nf_hook_ops ip_vs_out_ops = { .hook = ip_vs_out, .owner = THIS_MODULE, .pf = PF_INET, .hooknum = NF_IP_FORWARD, .priority = 100, }; /* After packet filtering (but before ip_vs_out_icmp), catch icmp destined for 0.0.0.0/0, which is for incoming IPVS connections */ static struct nf_hook_ops ip_vs_forward_icmp_ops = { .hook = ip_vs_forward_icmp, .owner = THIS_MODULE, .pf = PF_INET, .hooknum = NF_IP_FORWARD, .priority = 99, }; /* Before the netfilter connection tracking, exit from POST_ROUTING */ static struct nf_hook_ops ip_vs_post_routing_ops = { .hook = ip_vs_post_routing, .owner = THIS_MODULE, .pf = PF_INET, .hooknum = NF_IP_POST_ROUTING, .priority = NF_IP_PRI_NAT_SRC-1, };
下面主要分析下四个钩子是如何工作的.
首先报文从out->in方向发到本地的报文进入ip_vs_in处理
/* * Check if it‘s for virtual services, look it up, * and send it on its way... */ static unsigned int ip_vs_in(unsigned int hooknum, struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { struct sk_buff *skb = *pskb; struct iphdr *iph; struct ip_vs_protocol *pp; struct ip_vs_conn *cp; int ret, restart; int ihl; /* * Big tappo: only PACKET_HOST (neither loopback nor mcasts) * ... don‘t know why 1st test DOES NOT include 2nd (?) */ PACKET_HOST代表什么? PACKET_HOST代表本地的报文,即mac地址为本机网卡mac地址 if (unlikely(skb->pkt_type != PACKET_HOST || skb->dev == &loopback_dev || skb->sk)) { IP_VS_DBG(12, "packet type=%d proto=%d daddr=%d.%d.%d.%d ignored\n", skb->pkt_type, skb->nh.iph->protocol, NIPQUAD(skb->nh.iph->daddr)); return NF_ACCEPT; } iph = skb->nh.iph; if (unlikely(iph->protocol == IPPROTO_ICMP)) { int related, verdict = ip_vs_in_icmp(pskb, &related, hooknum); if (related) return verdict; skb = *pskb; iph = skb->nh.iph; } //此处为ip_vs_protol_init时注册的协议 /* Protocol supported? */ pp = ip_vs_proto_get(iph->protocol); if (unlikely(!pp)) return NF_ACCEPT; ihl = iph->ihl << 2; /* * Check if the packet belongs to an existing connection entry */ 根据目的ip,查找此连接是否已经存在,或没有查找到则说明之前此连接并未建立过,需要为这个连接通过conn_schedule选择rserver.并将此连接信息存入ip_vs_conn_tab数组中. cp = pp->conn_in_get(skb, pp, iph, ihl, 0); if (unlikely(!cp)) { int v; if (!pp->conn_schedule(skb, pp, &v, &cp)) return v; } if (unlikely(!cp)) { /* sorry, all this trouble for a no-hit :) */ IP_VS_DBG_PKT(12, pp, skb, 0, "packet continues traversal as normal"); return NF_ACCEPT; } IP_VS_DBG_PKT(11, pp, skb, 0, "Incoming packet"); 如果rserver状态异常则将连接删除(expire?),并将此报文丢弃. /* Check the server status */ if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */ if (sysctl_ip_vs_expire_nodest_conn) { /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); } /* don‘t restart its timer, and silently drop the packet. */ __ip_vs_conn_put(cp); return NF_DROP; } ip_vs_in_stats(cp, skb); restart = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pp); //关键:根据不同的lvs模式(DR NAT等)将报文做不同的发送处理(ip_vs_conn_new->ip_vs_bind_xmit) //例:DR模式调用ip_vs_dr_xmit,其中查询路由完成之后调用IP_VS_XMIT,走NF_IP_LOCAL_OUT进入netfilter框架处理. if (cp->packet_xmit) ret = cp->packet_xmit(skb, cp, pp); /* do not touch skb anymore */ else { IP_VS_DBG_RL("warning: packet_xmit is null"); ret = NF_ACCEPT; } /* increase its packet counter and check if it is needed to be synchronized */ atomic_inc(&cp->in_pkts); if ((ip_vs_sync_state & IP_VS_STATE_MASTER) && (cp->protocol != IPPROTO_TCP || cp->state == IP_VS_TCP_S_ESTABLISHED) && (atomic_read(&cp->in_pkts) % sysctl_ip_vs_sync_threshold[1] == sysctl_ip_vs_sync_threshold[0])) ip_vs_sync_conn(cp); ip_vs_conn_put(cp); return ret; }
to be contined.
引用:
http://blog.csdn.net/majieyue/article/details/8574580