nagios之监控 / 憋错料

监控系统需要监控：

1、本地资源：负载（uptime），cpu（top、sar），磁盘（df），内存（free），IO（iostat），RAID，温度，passwd文件的变化，本地所有文件指纹识别

2、网络服务：端口，URL，丢包，进程数，网络流量

3、其他设备：路由器，交换机端口流量，打印机，windows

4、业务数据：用户登录失败次数，用户登录网站次数，输入验证码失败的次数

某个API接口流量并发，电商网站订单，支付交易的数量

5、监控软件本身仅仅是一个平台，想要监控内容，理论上只要在服务器命令行可以获取的

数据就都可以被监控软件监控

nagios监控一般由一个主程序nagios、一个插件程序nagios-plugins和一些可选的附加程序（NRPE、NSClient++、NSCA和NDOUtils）等

nagios本身就是一个监控平台而已，其具体的监控工作都是由插件（nagios-plugins）来实现的，因此nagios主程序和nagios-plugins插件都是

nagios服务器端必须要安装的程序组件，nagios-plugins一般也要安装于被监控端。几个附加程序的描述如下：

1、NRPE：半被动模式

存在位置：NRPE软件工作于被监控端，操作系统为Linux，Unix系统

NRPE作用：用于在被监控的远程Linux，Unix主机上执行脚本插件获取数据回传给服务器端，以实现对这些主机资源的监控

存在形式：守护进程agent模式，开启的端口5666

2、NSClient++：半被动模式

存在位置：用于被监控端为Windows系统的服务器

作用：功能相当于Linux下的NRPE

用于监控Windows主机时，安装在Windows主机上的组件

3、NSCA：纯被动模式的监控

位置：NSCA需要同时安装在服务器端和客户端

作用：用于让被监控的远程Linux、Unix主机主动将监控到的信息发送给nagios服务器（在分布式监控集群模式中要用到，300台服务器以内，可以不用）

分布式监控NSCA外部构件简介：为完成从远程主机主动提交强制检测结果，于是就开发了NSCA外部构件。该外部构件包括两部分，第一部分

是客户端程序（send_nsca），运行于远程主机上并负责将强制检测结果送到指定的nagios服务器端，另一部分是NSCA守护进程（nsca）

它既可以独立的运行于守护服务也可以注册到inetd里作为一个inetd客户程序来提供监听连接。从客户端收到服务检测结果信息之后，守护进程

将结果提交给在中心服务器的nagios，方式是通过在外部命令文件里插入一条PROCESS_SVC_CHECK_RESULT命令，之后跟上检测结果。在nagios

服务器端下一次处理外部命令时将会找到这条由分布式服务器送来的强制检测结果并处理它

首先进行安装nagios服务器端：

1、安装需要的一些包组件

yum install httpd php php-cli gcc glibc glibc-common gd gd-devel net-snmp -y

2、修改httpd.conf文件

vim /etc/httpd/conf/httpd.conf

-》ServerName localhost

service httpd start

3、添加用户

#useradd nagios

useradd nagios

passwd nagios ---->设置密码为redhat

4、添加一个gagcmd组

#groupadd gagcmd

groupadd nagcmd

usermod -a -G nagcmd nagios

usermod -a -G nagcmd apache

5、编译安装nagios软件

mkdir /home/huang/tools -p

cd /home/huang/tools/

wget https://sourceforge.net/projects/nagios/files/nagios-4.x/nagios-4.1.1/nagios-4.1.1.tar.gz

tar xzf nagios-4.1.1.tar.gz

cd nagios-4.1.1

./configure --with-command-group=nagcmd

make all

make install

make install-init

make install-config

make install-commandmode

make install-webconf

--》由于nagios的web界面需要授权认证：

-》htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin ---->设置密码为redhat

service httpd restart

6、编译安装插件nagios-plugins

cd /home/huang/tools/

wget http://www.nagios-plugins.org/download/nagios-plugins-2.1.1.tar.gz

tar xfz nagios-plugins-2.1.1.tar.gz

cd nagios-plugins-2.1.1

./configure --with-nagios-user=nagios --with-nagios-group=nagios

make

make install

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 检测一下是否有错误

service nagios start

chkconfig --add nagios

chkconfig nagios on

7、安装插件nrpe：

wget https://github.com/NagiosEnterprises/nrpe/archive/3.0.tar.gz

tar xf 3.0.tar.gz

cd nrpe-3.0/

./configure

make all

make install

make install-plugin

make install-daemon

make install-config

以下是一些编译参数选择：

###

[[email protected] nrpe-3.0]# make

Please enter make [option] where [option] is one of:

all builds nrpe and check_nrpe

nrpe builds nrpe only

check_nrpe builds check_nrpe only

install-groups-users add the users and groups if they do not exist

install install nrpe and check_nrpe

install-plugin install the check_nrpe plugin

install-daemon install the nrpe daemon

install-config install the nrpe configuration file

install-inetd install the startup files for inetd, launchd, etc.

install-init install the startup files for init, systemd, etc.

开启nagios服务端：

service nagios start

浏览器访问：

http://192.168.1.155/nagios

安装需要被监控的客户端软件：

#监控Linux、Unix主机，安装客户端

添加nagios用户

useradd -m nagios -s /sbin/nologin

#安装插件nagios-plugins

cd /home/huang/tools/

wget http://www.nagios-plugins.org/download/nagios-plugins-2.1.1.tar.gz

tar xfz nagios-plugins-2.1.1.tar.gz

cd nagios-plugins-2.1.1

./configure --with-nagios-user=nagios --with-nagios-group=nagios --prefix=/usr/local/nagios

make

make install

chown -R nagios.nagios /usr/local/nagios/

#安装NRPE

wget https://github.com/NagiosEnterprises/nrpe/archive/3.0.tar.gz

tar xf 3.0.tar.gz

cd nrpe-3.0/

./configure

make all

make install

make install-plugin

make install-daemon

make install-config

#######测试监控

用nagios监控本机做测试：

[[email protected] objects]# pwd

/usr/local/nagios/etc/objects

[[email protected] objects]# tree

├── commands.cfg -->存放nagios命令相关配置（也可指定commands目录），这里的命令不是系统命令，而是实现把nagios里定义的命令和

Linux系统里的插件关联的一个文件

├── contacts.cfg --》存放报警联系人的相关配置的文件

├── localhost.cfg

├── printer.cfg

├── switch.cfg

├── templates.cfg --》模板配置文件，模板的存在是为了方便的配置服务器配置。

├── timeperiods.cfg --》存放报警周期时间等相关配置

└── windows.cfg

service.cfg --》存放具体被监控的主机相关配置，默认不存在

hosts.cfg --》存放具体被监控的主机相关配置，默认不存在

将localhost.cfg重名了为hosts.cfg

主动模式：和NRPE无关，就是利用服务器端本地插件直接获取信息

一般监控的服务：httpd、sshd、mysqld等等

被动模式：就是通过客户端NRPE获取数据

主程序通过check_nrpe插件，和客户端nrpe进程进行沟通，调用本地插件获取数据

#配置check_nrpe命令

将check_nrpe配置到commands.cfg配置文件中，成功调用插件

define command {

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS -c $ARG1$

}

示例：

/usr/local/nagios/libexec/check_nrpe -H 192.168.1.11(客户端ip) -c check_disk

#配置hosts.cfg文件

将localhost.cfg重名了为hosts.cfg，将它修改为以下

[[email protected] etc]# cat objects/hosts.cfg

###############################################################################

# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE

# NOTE: This config file is intended to serve as an *extremely* simple

# example of how you can create configuration entries to monitor

# the local (Linux) machine.

# HOST DEFINITION

# Define a host for the local machine

#define a host for nagios server

define host{

use linux-server ; Name of host template to use ----》这里使用的是默认的模板

; This host definition will inherit all variables that are defined

; in (or inherited by) the linux-server host template definition.

host_name nagios_server ----》这里的名称可以随便取

alias nagios_server

address 192.168.1.155 ----》这里我以nagios本机服务器端作为监控

}

# HOST GROUP DEFINITION

# Define an optional hostgroup for Linux machines

define hostgroup{

hostgroup_name linux-servers ; The name of the hostgroup

alias Linux Servers ; Long name of the group

members nagios_server ; Comma separated list of hosts that belong to this group -----》将要监控的hosts添加到hostgroup组中，放进members

}

#配置services.cfg文件

cd /usr/local/nagios/etc/objects

touch services.cfg

chown -R nagios.nagios services.cfg

编辑该文件添加需要监控的服务

define service{

use generic-service

host_name nagios_server -----》这里我以nagios服务端做监控样例，由于我nrpe插件没有启动所有不出监控的服务信息

service_description CPU Load

check_command check_nrpe!check_load

}

手动先使用命令收集下数据：

[[email protected] etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.155 -c check_load

connect to address 192.168.1.155 port 5666: Connection refused

connect to host 192.168.1.155 port 5666: Connection refused

发现收集数据失败，发现并没有启动nrpe服务，于是

#nrpe 启动监听的端口5666

修改配置文件：nrpe.cfg

allowed_hosts=127.0.0.1,192.168.1.155

[[email protected] objects]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

[[email protected] objects]# netstat -tunlp

Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name

tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2597/sshd

tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1216/master

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 55737/nrpe

tcp 0 0 :::80 :::* LISTEN 33699/httpd

tcp 0 0 :::22 :::* LISTEN 2597/sshd

tcp 0 0 ::1:25 :::* LISTEN 1216/master

tcp 0 0 :::5666 :::* LISTEN 55737/nrpe

再次手动测试是否能收集到数据

[[email protected] objects]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.155 -c check_load

OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;

数据采集到了，观察web监控页面

于是简单的监控完成

参考文章：http://tecadmin.net/install-nagios-core-service-on-centos-rhel/

以及附件的pdf

时间： 2024-11-11 10:55:51

nagios之监控

nagios之监控的相关文章

搭建基于Nagios的监控系统——之监控远程Windows服务器

nagios 添加自定义监控项目监控mysql数据库死锁

Nagios 系统监控

shell编程之【nagios自定义监控系统磁盘脚本】

awk：nagios流量监控插件

Nagios记录系统监控日志

Nagios 里面监控MySQL事务一直RUNNING没有结束的报警通知

实战Nagios网络监控（1）——监控本机运行状态和Mysq主机

实战Nagios网络监控（2）—— Nagios+Nrpe监控其他主机

nagios网络监控