问题说明:
在一台zabbix被监控服务器上(64位centos6.8系统,64G内容)启动zabbix_agent,发现进程无法启动,10050端口没有起来!
启动zabbix_agent进程没有报错,但10050端口没有正常启动起来。
[[email protected] ~]# /usr/local/zabbix/sbin/zabbix_agentd
[[email protected] ~]# ps -ef|grep zabbix_agent
root 27506 27360 0 11:07 pts/5 00:00:00 grep --color zabbix
[[email protected] etc]# lsof -i:10050
查看/usr/local/zabbix/logs/zabbix_agentd.log日志,发现报错如下:
................
27667:20161027:111554.851 cannot allocate shared memory of size 657056: [28] No space left on device
27667:20161027:111554.851 cannot allocate shared memory for collector
..............
原因分析:
这是因为内核对share memory的限制造成的。
处理过程记录:
[[email protected] logs]# ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1940588
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767
------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536
从上面命令结果可以看到:
max total shared memory设置的是2M,max seg size设置的是8M,这显然不够allocate(分配)zabbix_agent启动所使用的内存。
查看目前的共享内存设置,
[[email protected] logs]# sysctl -a|grep shm
kernel.shmmax = 1987162112
kernel.shmall = 2097152
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0
其中kernel.shmall代表总共能分配的共享内存,这里是2G,kernel.shmax代表单个段能allocate的内存(以字节为单位),这里是2M,所以肯定有问题!
然后查看/etc/sysctl.conf
[[email protected] logs]# cat /etc/sysctl.conf
........
kernel.shmall = 2097152
kernel.shmmax = 1987162112
显然在sysctl.conf文件里设置的kernel.shamll和kernel.shmmax参数的值小了。
--------------------------------------------------------------
本机是64位的centos 6.8系统,64G内存,查看其它同系统的被监控服务器发现:
[[email protected] ~]# cat /etc/sysctl.conf
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
[[email protected] logs]# ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767
------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536
即64位的centos6系统(64G)的上面两个参数的默认值是64G和4G,设置的都是系统能识别的最大内存。
---------------------------------------------------------------
现在只需要在本机调大这两个参数值即可解决问题!
[[email protected] logs]# cat /etc/sysctl.conf
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.msgmnb = 65536
kernel.msgmax = 65536
执行sysctl -p生效
[[email protected] logs]# sysctl -p
再次查看发现已经修改成功了!
[[email protected] logs]# sysctl -a|grep shm
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0
[[email protected] logs]# ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767
------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536
最后重新启动zabbix,发现10050端口顺利启动了:
[[email protected] ~]# /usr/local/zabbix/sbin/zabbix_agentd
[[email protected] logs]# ps -ef|grep zabbix
zabbix 27776 1 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd
zabbix 27777 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
zabbix 27778 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
zabbix 27779 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
zabbix 27780 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
zabbix 27781 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root 28188 27360 0 11:48 pts/5 00:00:00 grep --color zabbix
[[email protected] logs]# lsof -i:10050
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
zabbix_ag 27776 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27777 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27778 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27779 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27780 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27781 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
[[email protected] logs]#
总结:
其实不止是zabbix程序启动会碰到这个问题,很多程序出现此错误也能使用该方法解决,就是因为内核对资源的限制问题。