icinga2学习和使用（二）

配置文件说明

本篇是说明icinga2的配置文件。跟nagios比较，逻辑思维是一样的，定义主机（组）、服务（组）、检测命令、模板、检测频率等。但是实际使用的语法却不一样，重新定义了一套关键词。具体细节可参考下文。有些地方我也没能搞明白，希望读者童鞋能一起探讨。

默认采用yum安装的icinga2.

1 matser server上配置文件的两个目录：

/etc/icinga2/，更多的配置放在./conf.d下，这里主要是用来自定义配置。文件名，只要你能明白是用来做什么的即可，不需要一定区分user，service什么的。

/usr/share/icinga2/include/ ，这里主要是一些已经定义好的命令，可以直接使用。

2 各个配置文件的说明

commands.conf和command-plugins.conf 定义命令

object
CheckCommand "ssh" {

import
"plugin-check-command"

command
= PluginDir+ "/check_ssh" #defined in constants.conf

arguments
= {

"-p"
= "$ssh_port$"

"host"
= {

value
= "$ssh_address$"

skip_key
= true

order
= -1

}

vars.ssh_address
= "$address$"

}

object CheckCommand 为定义命令的固件关键词

import导入模板command.conf里的

Command用法，PluginDir定义在constants.conf

Arguments参数，如果是自定义的脚本，可以不需要在这里定义命令

"-p"
= "$ssh_port$" 这个-p是插件本身的参数，后面的ssh_port是自定义名，格式$.....$

templates.conf 定义模板

针对host的检测模板：

template
Host "generic-host" {

max_check_attempts
= 3

check_interval
= 5m

retry_interval
= 30s

check_command
= "hostalive"

}

针对service的检测模板：

template
Service "generic-service" {

max_check_attempts
= 2

check_interval
= 5m

retry_interval
= 20s

}

template
Host template
Service固定格式，后面引号内名字自定义

max_check_attempts检测遇到问题，最多尝试次数

check_interval 检测的频率

retry_interval 如果检测遇到问题，重新检测的频率

通知模板：

template
Notification "30mins-notification" {

interval
= 30m

command
= "mail-service-notification"

states
= [ Critical ]

types
= [ Problem, Recovery ]

period
= "24x7"

}

Command定义在commands.conf里

States这里设置需要发报警邮件的状态，我只设置critical，减少邮件量

Types为states的类型，很多

Perio报警的时间段

如果你想延迟第一次报警的时间，可如下：

apply
Notification "mail" to Service {

import
"generic-notification"

command
= "mail-notification"

users
= [ "icingaadmin" ]

times.begin
= 15m //
delay first notification

assign
where service.name == "ping4"

}

Tips：

When
detecting a problem with a host/service Icinga re-checks the object a
number of times (based on the max_check_attempts and retry_interval
settings) before sending notifications. This ensures that no
unnecessary notifications are sent for transient failures. During
this time the object is in a SOFT state.After all re-checks have been executed and the object is still in a
non-OK state the host/service switches to a HARD state
and notifications are sent.

users.conf 用来定义报警和定义主机

object
User "icingaadmin"
{

import
"generic-user"

display_name
= "Icinga 2 Admin"

groups
= [ "icingaadmins" ]

email
= "[email protected]"

}

object
Host "xx" {

display_name
= "xx"

address
= "xx"

groups
= [ "cs" ]

check_command
= "hostalive"

}

Object
User 或Object
Host是固定格式，后面的内容为自定义。

Host说明：

Import导入templates.conf里的模板

display_name 自定义

groups自定义，如果多个，用逗号隔开（但是是否每个都能用，有待确认）

address 可以是域名或者ip

check_command 检测主机的命令，这里用的hostalive，就是ping检测…

services.conf 定义服务（也可以给特别的服务单独写个xxx.conf）

objectService
"ssh" {

import
"generic-service"

check_command
= "ssh"

host_name=
"hk"

vars.ssh_port
= "22221"

}

针对单个主机的服务，可以用object
Service的方式定义。

var.ssh_port这里是自定义参数的使用方式。var.为固定格式，后面跟参数名，参数名是在command-plugins.conf中定义的，等号后面是自定义的端口。

针对一个服务很多主机的情况，用如下apply
service的方式定义：

applyService
"total_procs" {

import
"generic-service"

check_command
= "nrpe" # use nrpe command to check

vars.nrpe_command
= "check_total_procs"
#command on client server

assignwhere
"es" in host.groups

ignorewhere host.address == ""

}

apply
Service "http 80" {

import
"generic-service"

check_command
= "http" #
command on monitor server which has argument “-H”

assign
where "vu" in host.groups

ignore
where host.address == ""

}

用apply的方式，一定有关键词assign和ignore，后者可以为空，可以多行ignore（写在一行没成功）。

这里两个service定义，原理是一样的，都用插件，check_nrpe或者check_http，这里写的命令http或者nrpe已经定义在command-plugins.conf。

groups.confg 定义服务组或者主机组

object
ServiceGroup "load" {

display_name
= "Load Checks"

assignwhereservice.vars.nrpe_command==
"check_load"

}

object
ServiceGroup "ssh" {

display_name
= "Ssh Checks"

assign
where service.check_command== "ssh"

}

object
HostGroup "es" {

display_name
= "es server"

}

notifications.conf 应用报警（之前做了模板，现在是应用）

apply
Notification "mail-icingaadmin" to Host {

import
"mail-host-notification"

user_groups
= [ "icingaadmins" ]

assign
where host.vars.sla == "24x7"

}

apply
Notification "mail-icingaadmin-5" to Service {

import
"5mins-notification"

user_groups
= [ "icingaadmins" ]

assign
where service.name == "ssh"

assign
where service.name == "check_system_5"

assign
where service.name == "zombie_procs"

assign
where service.name == "http 80"

assign
where service.name == "ssh"

}

验证并加载配置

icinga2
-c /etc/icinga2/icinga2.conf -C

/etc/init.d/icinga2
reload --config /etc/icinga2/icinga2.conf

时间： 2025-01-03 22:04:13

icinga2学习和使用（二）

icinga2学习和使用（二）的相关文章

winform学习日志（二十三）---------------socket（TCP）发送文件

Unix文件系统学习笔记之二：文件描述符、inode和打开文件表

C++primer学习笔记（二）——Chapter 4

Android学习路线（二十）运用Fragment构建动态UI

Android学习路线（二十一）运用Fragment构建动态UI——创建一个Fragment

struts2学习笔记（二）—— 获取登录信息及计算在线人数

《语义网基础教程》学习笔记（二）

《iOS应用逆向工程》学习笔记（二）iOS系统目录结构（部分）

现代C++学习笔记之二入门篇2,数据转换

HTML5学习笔记（二）——表单1