1、部署准备
说明:所有的容器组都运行在monitoring 命名空间
本文参考https://github.com/coreos/kube-prometheus
由于官方维护的版本在现有的部署环境出现问题所以下面做了一些修改及变更不影响整体效果
Alertmanager 项目使用官方yaml 不做任何修改
2、Alertmanager 相关服务的yaml 准备
2.1、下载官方yaml
mkdir kube-prometheus
cd kube-prometheus
git clone https://github.com/coreos/kube-prometheus
cd kube-prometheus/manifests
mkdir prometheus-alertmanager
mv alertmanager* prometheus-alertmanager
2.2、创建 Alertmanager 服务
cd prometheus-alertmanager
kubectl apply -f .
2.3、 查看alertmanager 状态
[[email protected] prometheus-alertmanager]# kubectl get pod -n monitoring -o wide | grep alertmanager
alertmanager-main-0 2/2 Running 0 36d 10.65.1.136 node02 <none> <none>
alertmanager-main-1 2/2 Running 0 26d 10.65.4.246 node03 <none> <none>
alertmanager-main-2 2/2 Running 0 36d 10.65.0.53 node01 <none> <none>
http://10.65.1.136:9093/#/alerts
http://10.65.4.246:9093/#/alerts
http://10.65.0.53:9093/#/alerts
可以分别打开alertmanager web页
[[email protected] prometheus-alertmanager]# kubectl get service -n monitoring -o wide | grep alertmanager
alertmanager-main ClusterIP 10.64.215.237 <none> 9093/TCP 43d alertmanager=main,app=alertmanager
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 36d app=alertmanager
http://10.64.215.237:9093/#/alerts
3、配置 alertmanager webhook 地址 例子
prometheus alertmanager 支持配置自动发现和更新
因此,我们只需要重新生成配置即可 首先,删除原有的配置项
kubectl delete secret alertmanager-main -n monitoring
编写一个 webhook 配置文件,命名为 alertmanager.yaml
报警项目参考https://github.com/qist/msg-senderglobal: resolve_timeout: 5m route: group_by: [‘alertname‘] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: ‘webhook‘ receivers: - name: ‘webhook‘ webhook_configs: - url: ‘http://msg-sender.monitoring:4000/sender/wechat‘
注意,这里的 url 要跟 msg-sender 提供的服务地址对应上
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
确认下 alertmanager 的配置项是否正确更新了
Config
global:
resolve_timeout: 5m
http_config: {}
smtp_hello: localhost
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
hipchat_api_url: https://api.hipchat.com/
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: webhook
group_by:
- alertname
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: webhook
webhook_configs:
- send_resolved: true
http_config:{}
url: http://msg-sender.monitoring:4000/sender/wechat
templates: []
然后,查看 msg-sender 的容器日志,可以看到已经收到了来自 alertmanager 的 webhook 告警
而且已经模拟了wechat 的发送动作!
tail -n 10 msg-sender2019-06-19.log
INFO: 2019/06/19 09:29:02 http.go:238: {"errcode":0,"errmsg":"ok","invaliduser":""}
INFO: 2019/06/19 09:29:02 http.go:231: #sendWechat# client:1.8.17.209:41088, to:huangdaquan, requestType:application/x-www-form-urlencoded, content:2019-06-19 09:29:01 platform bulletin is not available!
下一篇: Kubernetes 生产环境安装部署 基于 Kubernetes v1.14.0 之 prometheus与grafana 部署
原文地址:https://blog.51cto.com/juestnow/2410802