配置文件说明
本篇是说明icinga2的配置文件。跟nagios比较,逻辑思维是一样的,定义主机(组)、服务(组)、检测命令、模板、检测频率等。但是实际使用的语法却不一样,重新定义了一套关键词。具体细节可参考下文。有些地方我也没能搞明白,希望读者童鞋能一起探讨。
默认采用yum安装的icinga2.
1 matser server上配置文件的两个目录:
/etc/icinga2/, 更多的配置放在./conf.d下,这里主要是用来自定义配置。文件名,只要你能明白是用来做什么的即可,不需要一定区分user,service什么的。
/usr/share/icinga2/include/ , 这里主要是一些已经定义好的命令,可以直接使用。
2 各个配置文件的说明
commands.conf和command-plugins.conf 定义命令
object
CheckCommand "ssh" {
import
"plugin-check-command"
command
= PluginDir+ "/check_ssh" #defined in constants.conf
arguments
= {
"-p"
= "$ssh_port$"
"host"
= {
value
= "$ssh_address$"
skip_key
= true
order
= -1
}
}
vars.ssh_address
= "$address$"
}
object CheckCommand 为定义命令的固件关键词
import导入模板command.conf里的
Command用法,PluginDir定义在constants.conf
Arguments参数,如果是自定义的脚本,可以不需要在这里定义命令
"-p"
= "$ssh_port$" 这个-p是插件本身的参数,后面的ssh_port是自定义名,格式$.....$
templates.conf 定义模板
针对host的检测模板:
template
Host "generic-host" {
max_check_attempts
= 3
check_interval
= 5m
retry_interval
= 30s
check_command
= "hostalive"
}
针对service的检测模板:
template
Service "generic-service" {
max_check_attempts
= 2
check_interval
= 5m
retry_interval
= 20s
}
template
Host template
Service固定格式,后面引号内名字自定义
max_check_attempts检测遇到问题,最多尝试次数
check_interval 检测的频率
retry_interval 如果检测遇到问题,重新检测的频率
通知模板:
template
Notification "30mins-notification" {
interval
= 30m
command
= "mail-service-notification"
states
= [ Critical ]
types
= [ Problem, Recovery ]
period
= "24x7"
}
Command定义在commands.conf里
States这里设置需要发报警邮件的状态,我只设置critical,减少邮件量
Types为states的类型,很多
Perio报警的时间段
如果你想延迟第一次报警的时间,可如下:
apply
Notification "mail" to Service {
import
"generic-notification"
command
= "mail-notification"
users
= [ "icingaadmin" ]
times.begin
= 15m //
delay first notification
assign
where service.name == "ping4"
}
Tips:
When
detecting a problem with a host/service Icinga re-checks the object a
number of times (based on the max_check_attempts and retry_interval
settings) before sending notifications. This ensures that no
unnecessary notifications are sent for transient failures. During
this time the object is in a SOFT state.After all re-checks have been executed and the object is still in a
non-OK state the host/service switches to a HARD state
and notifications are sent.
users.conf 用来定义报警和定义主机
object
User "icingaadmin"
{
import
"generic-user"
display_name
= "Icinga 2 Admin"
groups
= [ "icingaadmins" ]
email
= "[email protected]"
}
object
Host "xx" {
display_name
= "xx"
address
= "xx"
groups
= [ "cs" ]
check_command
= "hostalive"
}
Object
User 或Object
Host是固定格式,后面的内容为自定义。
Host说明:
Import导入templates.conf里的模板
display_name 自定义
groups自定义,如果多个,用逗号隔开(但是是否每个都能用,有待确认)
address 可以是域名或者ip
check_command 检测主机的命令,这里用的hostalive,就是ping检测…
services.conf 定义服务(也可以给特别的服务单独写个xxx.conf)
objectService
"ssh" {
import
"generic-service"
check_command
= "ssh"
host_name=
"hk"
vars.ssh_port
= "22221"
}
针对单个主机的服务,可以用object
Service的方式定义。
var.ssh_port这里是自定义参数的使用方式。var.为固定格式,后面跟参数名,参数名是在command-plugins.conf中定义的,等号后面是自定义的端口。
针对一个服务很多主机的情况,用如下apply
service的方式定义:
applyService
"total_procs" {
import
"generic-service"
check_command
= "nrpe" # use nrpe command to check
vars.nrpe_command
= "check_total_procs"
#command on client server
assignwhere
"es" in host.groups
ignorewhere host.address == ""
}
apply
Service "http 80" {
import
"generic-service"
check_command
= "http" #
command on monitor server which has argument “-H”
assign
where "vu" in host.groups
ignore
where host.address == ""
}
用apply的方式,一定有关键词assign和ignore,后者可以为空,可以多行ignore(写在一行没成功)。
这里两个service定义,原理是一样的,都用插件,check_nrpe或者check_http,这里写的命令http或者nrpe已经定义在command-plugins.conf。
groups.confg 定义服务组或者主机组
object
ServiceGroup "load" {
display_name
= "Load Checks"
assignwhereservice.vars.nrpe_command==
"check_load"
}
object
ServiceGroup "ssh" {
display_name
= "Ssh Checks"
assign
where service.check_command== "ssh"
}
object
HostGroup "es" {
display_name
= "es server"
}
notifications.conf 应用报警(之前做了模板,现在是应用)
apply
Notification "mail-icingaadmin" to Host {
import
"mail-host-notification"
user_groups
= [ "icingaadmins" ]
assign
where host.vars.sla == "24x7"
}
apply
Notification "mail-icingaadmin-5" to Service {
import
"5mins-notification"
user_groups
= [ "icingaadmins" ]
assign
where service.name == "ssh"
assign
where service.name == "check_system_5"
assign
where service.name == "zombie_procs"
assign
where service.name == "http 80"
assign
where service.name == "ssh"
}
验证并加载配置
icinga2
-c /etc/icinga2/icinga2.conf -C
/etc/init.d/icinga2
reload --config /etc/icinga2/icinga2.conf