Nagiso 客户端要求必须在dell服务器上安装 OMSA(Openmanage Server Administrator) Nagios客户端安装OMSA (可参考http://linux.dell.com/repo/hardware/OMSA_7.4.0/) 1、增加dell的yum库 ( 可以访问 http://linux.dell.com/repo/hardware 查看最新版本 ) wget -q -O - http://linux.dell.com/repo/hardware/OMSA_7.4.0/bootstrap.cgi | bash 2、安装srvadmin yum install srvadmin-all -y 3、启动srvadmin /opt/dell/srvadmin/sbin/srvadmin-services.sh start Nagios 服务端配置 1、dell官方OMSA监控脚本下载(下载至Nagios 服务器 /usr/local/nagios/libexec下,并赋nagios执行权限) wget http://folk.uio.no/trondham/software/check_openmanage-3.7.11/check_openmanage 另,由于check_openmanage为perl脚本,故需perl解释器 需安装 perl-Net-SNMP yum install perl-Net-SNMP 2、以下是手动执行脚本获取硬件状态 #电压 ./check_openmanage -H 192.168.1.100 --only voltage VOLTAGE OK - 20 voltage probes checked #cpu ./check_openmanage -H 192.168.1.100 --only cpu PROCESSORS OK - 1 processors checked #风扇转速 ./check_openmanage -H 192.168.1.100 --only fans FANS OK - 12 fan probes checked #存储 ./check_openmanage -H 192.168.1.100 --only storage STORAGE OK - 3 physical drives, 1 logical drives #内存 ./check_openmanage -H 192.168.1.100 --only memory MEMORY OK - 2 memory modules, 32768 MB total memory #电池 ./check_openmanage -H 192.168.1.100 --only batteries BATTERIES OK - 1 batteries checked check_openmanage 脚本更多详细用法请参考: http://folk.uio.no/trondham/software/check_openmanage.html 如果以上无报错时,便可配置到Nagios 中了,配置方法网上较多,在这里就不多说了 #卸载openManage Server Administrator yum erase $(rpm -qa | grep srvadmin)
故障处理
1.当系统日志出现 Server Administrator (Shared Library): Data Engine EventID: 0 A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded 出现这个东西的时候表示你的监控已经不能用了。 大概意思是说:由于系统最大信号数量的限制,Data Engine未能成功开启。 这需要修改系统内核对于 semaphore sets 的设定。方法如下: ipcs -l ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 67108864 max total shared memory (kbytes) = 17179869184 min seg size (bytes) = 1 ------ Semaphore Limits -------- max number of arrays = 128 max semaphores per array = 250 max semaphores system wide = 32000 max ops per semop call = 32 semaphore max value = 32767 ------ Messages: Limits -------- max queues system wide = 16 max size of message (bytes) = 65536 default max size of queue (bytes) = 65536 sysctl -a | grep shm vm.hugetlb_shm_group = 0 kernel.shmmni = 4096 kernel.shmall = 4294967296 kernel.shmmax = 68719476736 解决办法 调整 max queues system wide max number of arrays sysctl -w kernel.msgmni=16384 sysctl -w kernel.sem="250 32000 100 1024" ################################################## echo "kernel.msgmni=16384" >> /etc/sysctl.conf echo "kernel.sem=\"250 32000 100 1024\"" >> /etc/sysctl.conf 再次查看 ipcs -l ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 67108864 max total shared memory (kbytes) = 17179869184 min seg size (bytes) = 1 ------ Semaphore Limits -------- max number of arrays = 1024 max semaphores per array = 250 max semaphores system wide = 32000 max ops per semop call = 100 semaphore max value = 32767 ------ Messages: Limits -------- max queues system wide = 16384 max size of message (bytes) = 65536 default max size of queue (bytes) = 65536 重启 /opt/dell/srvadmin/sbin/srvadmin-services.sh restart
2
refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, descr Systems Management SNMP MIB Plug-in Manager SNMP 被拒绝
/etc/init.d/snmpd restart 即可 如果你的配置是正确的
3
ipmi_si: Could not enable interrupts, failed set, using polled mode. 不能中断,错误的设置,使用轮询的方式。
下面我们将用自己的办法监控机器的状态
Get_Dell_Server_Detail.py 搜集DELL硬件信息 保存到/tmp目录下
cat /data/program/nagios-client/libexec/Get_Dell_Server_Detail.py #!/usr/bin/python2.7 # -*- coding:utf-8 -*- """ The Dell Server Hardware Detail author jastme """ import commands,os try: if os.path.exists(‘/tmp/Dell_Hardware_Detail.txt‘): pass except IOError: f=open(‘/tmp/Dell_Hardware_Detail.txt‘,‘w‘) f.close() def DellServer(): detail=commands.getoutput(‘/data/program/nagios-client/libexec/check_openmanage -s -d‘) ff=open(‘/tmp/Dell_Hardware_Detail.txt‘,‘w‘) ff.write(detail) ff.close() if __name__ == ‘__main__‘: DellServer()
时间: 2024-10-16 17:41:37