1.一定要找应用侧确认每台节点上需要监控的进程,不要盲目以为所有hadoop集群的zk、journal啥的都一样,切记!
2.被监控节点只需要安装nagios-plugin和nrpe,依赖需要安装xinetd
3.确认被监控节点上没有安装过nagios
4.确认被监控节点间、被监控节点和nagios server间的互信
5.开始
5-1 选择一个同操作系统的集群a的一个节点an,目标集群b
ssh an
for dn in cluster{an..b1}
do
echo "$dn is configuring the nagios............................."
ssh $dn useradd nagios -d /usr/local/nagios
scp -r /usr/local/nagios/ [email protected]$dn:/usr/local/
scp /etc/xinetd.d/nrpe [email protected]$dn:/etc/xinetd.d/nrpe
ssh $dn ‘echo "nrpe 5666/tcp #nrpe" >>/etc/services‘
ssh $dn ‘chown -R nagios:nagios /usr/local/nagios/‘
ssh $dn ‘service xinetd restart‘
echo "$dn is end configuring the nagios........................"
done
5-2 在集群b上批量安装
ssh b1
for dn in cluster{b2..bn}
do
echo "$dn is configuring the nagios............................."
ssh $dn useradd nagios -d /usr/local/nagios
scp -r /usr/local/nagios/ [email protected]$dn:/usr/local/
scp /etc/xinetd.d/nrpe [email protected]$dn:/etc/xinetd.d/nrpe
ssh $dn ‘echo "nrpe 5666/tcp #nrpe" >>/etc/services‘
ssh $dn ‘chown -R nagios:nagios /usr/local/nagios/‘
ssh $dn ‘service xinetd restart‘
echo "$dn is end configuring the nagios........................"
done
5-3 如果配置了logcheck /var/log/messages,一定要确认下/var/log/messages的权限是不是705
5-4 在server端增加新集群的配置文件
mkdir -p /usr/local/nagios/etc/servers/b
cd /usr/local/nagios/etc/servers/a
cat an.cfg|sed ‘s/an/bn/g‘|sed ‘s/an_ip/bn_ip/g‘>/usr/local/nagios/etc/servers/b/bn.cfg
5-5 在server端增加新集群的组配置文件
vi /usr/local/nagios/etc/servers/group.cfg
define hostgroup{
hostgroup_name b
alias b
members b1,....bn
}
6.集群下线
这次只是在nagios前台下线,下线集群保留nagios软件,升级后再重新监控
只需要在server端把该集群的所有配置删掉(移走)即可
注:不要想着修改权限。。。。。修改权限的结果就是nagios起不起来。。。