把Haproxy用在生产环境后,经常会遇到Haproxy误报"backend xxx_server has no server available!"的消息,而实际上,xxx_server是正常运行的。
最近误报越来越多,已经严重影响服务器的使用,之前几个月间或google了一些资料,都没有坚决这个问题。于是决定腾出时间从Haproxy的源码找原因。
仔细在Haproxy的源码里找了找,发现是tcp连接返回的错误导致误报。于是在一台测试服务器上重新部署了一个Haproxy准备调试具体的tcp错误信息。
有趣的是,新部署的测试Haproxy居然一切正常,没有误报。
对比了一下两台服务器,网络参数方面主要就是生产环境的内核TCP参数做过优化,而测试环境是系统默认的。
生产环境之前在/etc/sysctl.conf 文件中做了如下优化:
kernel.shmall = 268435456 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 1200 net.ipv4.ip_local_port_range = 1024 65000 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.ip_local_port_range = 5000 65000 net.ipv4.tcp_mem = 786432 1048576 1572864 net.core.wmem_max = 873200 net.core.rmem_max = 873200 net.ipv4.tcp_wmem = 8192 436600 873200 net.ipv4.tcp_rmem = 32768 436600 873200 net.core.somaxconn = 256 net.core.netdev_max_backlog = 1000 net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.tcp_retries2 = 5 net.ipv4.tcp_keepalive_time = 500 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.conf.lo.arp_ignore = 0 net.ipv4.conf.lo.arp_announce = 0 net.ipv4.conf.all.arp_ignore = 0 net.ipv4.conf.all.arp_announce = 0
看来这个是导致Haproxy在做健康检查时误报的原因,于是去掉上面的配置,重启生产环境的服务器,一切正常了。
时间: 2024-09-30 06:50:15