prometheus+grafana

prometheus集中管理服务搭建

#搭建在监控服务主机上,用于收集节点服务器信息

下载:https://prometheus.io/download/

解压

运行:nohup ./prometheus --config.file=./prometheus.yml &>> ./prometheus.log &

访问http://192.168.1.24:9090

node-exporter节点收集服务搭建

#搭建在需要主机服务器收集的服务器上

下载:https://prometheus.io/download/

解压

运行:nohup ./node_exporter &>> ./node_exporter.log &

重新加载:kill -1 PID

访问http://192.168.1.24:9100

添加到prometheus监控群中:

vim prometheus.yml

添加:

- job_name: ‘21‘

static_configs:

- targets: [‘192.168.1.21:9100‘]

- job_name: ‘24‘

static_configs:

- targets: [‘192.168.1.24:9100‘]

- job_name: ‘20‘

static_configs:

- targets: [‘192.168.1.20:9100‘]

#指定指标数据源的地址,多个地址之间用逗号隔开

alertmanager监控报警服务搭建

搭建在任意服务器上,收集报警信息,信息形式发给运维人员

下载:https://prometheus.io/download/

解压

运行:nohup ./alertmanager --config.file=./alertmanager.yml &>> ./alertmanager.log &

访问:http://192.168.1.24:9093

grafana图形框架服务搭建

人性化web展示,更好的监控服务器性能

下载:https://grafana.com/get

解压

运行:nohup ./grafana-server &>> ./grafana-server.log &

访问:http://192.168.1.24:3000

添加监控主机到grafana上:

点击保存

添加监控模板Kubernetes到grafana中

下载:https://grafana.com/dashboards

选择下载的模板

选择监控主机

添加并查看使用

需要收集数据一段时间才会有数据,耐心等待

grafana简单的使用

邮箱报警

alertmanager.yml指定邮箱的相关信息,详细请看看配置文件详解

prometheus.yml指定alertmanager地址和rule_files地址

vim first_rules.yml指定报警的规则

相关配置文件详解

prometheus.yml

# my global config

global:

scrape_interval:     15s

用于向pushgateway采集数据的频率,上图所示:每隔15秒向pushgateway采集一次指标数据

evaluation_interval: 15s

表示规则计算的频率,上图所示:每隔15秒根据所配置的规则集,进行规则计算

external_labels:

monitor: ‘codelab-monitor‘

为指标增加额外的维度,可用于区分不同的prometheus,在应用中多个prometheus可以对应一个alertmanager

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

设置altermanager的地址,后文会写到安装altermanager

- targets: ["192.168.1.24:9093"]

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global ‘evaluation_interval‘.

rule_files:

指定所配置规则文件,文件中每行可表示一个规则

- "/work/prometheus-2.5.0.linux-amd64/first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it‘s Prometheus itself.

scrape_configs:

指定任务名称,在指标中会增加该维度,表示该指标所属的job

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: ‘prometheus‘

static_configs:

- targets: [‘localhost:9090‘]

- job_name: ‘21‘

static_configs:

- targets: [‘192.168.1.21:9100‘]

- job_name: ‘24‘

static_configs:

- targets: [‘192.168.1.24:9100‘]

- job_name: ‘20‘

static_configs:

- targets: [‘192.168.1.20:9100‘]

指定指标数据源的地址,多个地址之间用逗号隔开

alertmanager.yml

global:

resolve_timeout: 5m

smtp_smarthost: ‘smtp.163.com:25‘

smtp_from: ‘[email protected]‘

smtp_auth_username: ‘[email protected]‘

smtp_auth_password: ‘Hxq7996026‘

smtp_require_tls: false

#邮箱地址

templates:

#指定告警信息展示的模版

- ‘/work/alertmanager-0.15.3.linux-amd64/template/123.tmpl‘

route:

group_by: [‘alertname‘]

group_wait: 10s

group_interval: 10s

repeat_interval: 1h

receiver: ‘mail‘

receivers:

#- name: ‘web.hook‘

#  webhook_configs:

#  - url: ‘http://127.0.0.1:5001/‘

- name: ‘mail‘

email_configs:

- to: ‘[email protected]‘

inhibit_rules:

- source_match:

severity: ‘critical‘

target_match:

severity: ‘warning‘

equal: [‘alertname‘, ‘dev‘, ‘instance‘]

first_rules.yml

groups:

- name: test-rule

rules:

- alert: clients

expr: node_load1 > 1

for: 1m

labels:

severity: warning

annotations:

summary: "{{$labels.instance}}: Too many clients detected"

description: "{{$labels.instance}}: Client num is above 80% (current value is: {{ $value }}"

set [email protected]  #作为发送邮件的账号

set smtp=smtp.163.com    #发送邮件的服务器

set [email protected]   #你的邮箱帐号

set smtp-auth-password=Hxq7996026 #授权码

set smtp-auth=login

cat /dev/urandom | md5sum

内存规则

groups:

- name: test-rule

rules:

- alert: "内存报警"

expr: 100 - ((node_memory_MemAvailable_bytes * 100) / node_memory_MemTotal_bytes) > 10

for: 1s

labels:

severity: warning

annotations:

summary: "服务名:{{$labels.alertname}}"

description: "业务500报警: {{ $value }}"

value: "{{ $value }}"

- name: test-rule2

rules:

- alert: "内存报警"

expr: 100 - ((node_memory_MemAvailable_bytes * 100) / node_memory_MemTotal_bytes) > 40

for: 1s

labels:

severity: test

annotations:

summary: "服务名:{{$labels.alertname}}"

description: "业务500报警: {{ $value }}"

value: "{{ $value }}"

((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > ${value}

cpu规则

100 - ((avg by (instance,job,env)(irate(node_cpu_seconds_total{mode="idle"}[30s]))) *100) > ${value}

磁盘规则

(node_filesystem_avail_bytes{fstype !~ "nfs|rpc_pipefs|rootfs|tmpfs",device!~"/etc/auto.misc|/dev/mapper/centos-home",mountpoint !~ "/boot|/net|/selinux"} /node_filesystem_size_bytes{fstype !~ "nfs|rpc_pipefs|rootfs|tmpfs",device!~"/etc/auto.misc|/dev/mapper/centos-home",mountpoint !~ "/boot|/net|/selinux"} ) * 100 > ${value}

流量规则:

(irate(node_network_transmit_bytes_total{device!~"lo"}[1m]) / 1000) > ${value}

应用占比

process_cpu_usage{job="${app}"} * 100 > ${value}

报警模板

groups:

- name: down

rules:

- alert: "down报警"

expr: up == 0

for: 1m

labels:

severity: warning

annotations:

summary: "down报警"

description: "报警时间:"

value: "已使用:{{ $value }}"

- name: memory

rules:

- alert: "内存报警"

expr: ((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 1

for: 1m

labels:

severity: warning

annotations:

summary: "内存报警"

description: "报警时间:"

value: "已使用:{{ $value }}%"

- name: cpu

rules:

- alert: "cpu报警"

expr: 100 - ((avg by (instance,job,env)(irate(node_cpu_seconds_total{mode="idle"}[30s]))) *100) > 80

for: 1m

labels:

severity: warning

annotations:

summary: "cpu报警"

description: "报警时间:"

value: "已使用:{{ $value }}%"

- name: disk

rules:

- alert: "disk报警"

expr: 100 - (node_filesystem_avail_bytes{fstype !~ "nfs|rpc_pipefs|rootfs|tmpfs",device!~"/etc/auto.misc|/dev/mapper/centos-home",mountpoint !~ "/boot|/net|/selinux"} /node_filesystem_size_bytes{fstype !~ "nfs|rpc_pipefs|rootfs|tmpfs",device!~"/etc/auto.misc|/dev/mapper/centos-home",mountpoint !~ "/boot|/net|/selinux"} ) * 100  > 80

for: 1m

labels:

severity: warning

annotations:

summary: "disk报警"

description: "报警时间:"

value: "已使用:{{ $value }}%"

- name: net

rules:

- alert: "net报警"

expr: (irate(node_network_transmit_bytes_total{device!~"lo"}[1m]) / 1000) > 80000

for: 1m

labels:

severity: warning

annotations:

summary: "net报警"

description: "报警时间:"

value: "已使用:{{ $value }}KB"

原文地址:https://www.cnblogs.com/hxqxiaoqi/p/10647256.html

时间: 2024-10-16 04:59:23

prometheus+grafana的相关文章

Prometheus+Grafana监控部署实践

参考文档: Prometheus github:https://github.com/prometheus grafana github:https://github.com/grafana/grafana Prometheus getting_started:https://prometheus.io/docs/prometheus/latest/getting_started/ Prometheus node_exporter:https://github.com/prometheus/no

prometheus+grafana+docker 监控tomcat jvm

说明:基于环境mesos+marathon+docker+prometheus+grafana监控tomcat一.配置环境(1)dockerfile from tomcat COPY tomcat-users.xml /usr/local/tomcat/conf/ COPY server.xml /usr/local/tomcat/conf/ COPY context.xml /usr/local/tomcat/webapps/manager/META-INF/ COPY catalina.sh

prometheus+grafana+consul+supervisor搭建监控系统(一)之supervisor搭建

随着公司业务增大,zabbix+脚本监控已经不能满足需求,经过一段时间的研究发现prometheus+grafana+consul+supervisor非常不错,supervisor托管非daemod进程,动态注册到consul中,prometheus通过targets发现consul中的服务,Grafan中data Source配置:Prometheus(http://xxx.xxx.xxx.xxx:9090)获取源数据展示出来(非常漂亮,美观) 把搭建的过程分享给大家  PS:有些还在研究中

cAdvisor+Prometheus+Grafana监控docker

cAdvisor+Prometheus+Grafana监控docker 一.cAdvisor(需要监控的主机都要安装) 官方地址:https://github.com/google/cadvisor CAdvisor是谷歌开发的用于分析运行中容器的资源占用和性能指标的开源工具.CAdvisor是一个运行时的守护进程,负责收集.聚合.处理和输出运行中容器的信息.注意在查找相关资料后发现这是最新版cAdvisor的bug,换成版本为google/cadvisor:v0.24.1 就ok了,映射主机端

prometheus+grafana监控redis

prometheus+grafana监控redis redis安装配置 https://www.cnblogs.com/autohome7390/p/6433956.html redis_exporter 安装 cd /usr/src wget https://github.com/oliver006/redis_exporter/releases/download/v0.21.2/redis_exporter-v0.21.2.linux-amd64.tar.gz tar xf redis_ex

Rancher2.x 一键式部署 Prometheus + Grafana 监控 Kubernetes 集群

目录 1.Prometheus & Grafana 介绍 2.环境.软件准备 3.Rancher 2.x 应用商店 4.一键式部署 Prometheus 5.验证 Prometheus + Grafana 1.Prometheus & Grafana 介绍 Prometheus 是一套开源的系统监控.报警.时间序列数据库的组合,Prometheus 基本原理是通过 Http 协议周期性抓取被监控组件的状态,而通过 Exporter Http 接口输出这些被监控的组件信息,而且已经有很多 E

最新版Prometheus+Grafana+node-exporter炫酷界面

一.概述 理论知识就不多介绍了,参考链接: https://www.cnblogs.com/xiao987334176/p/9930517.html 本文使用2台服务器,来搭建. 环境 操作系统 docker版本 ip 容器 centos 7.4 18.09.2 192.168.10.104 Prometheus+Grafana+node-exporter centos 7.4 18.09.2 192.168.10.20 node-exporter 二.安装 请确保2台服务器,已经安装了dock

Prometheus + Grafana 部署说明之安装

说明 在前面的Prometheus学习系列文章里,大致介绍说明了Prometheus和Grafana的一些使用,现在开始介绍如何从头开始部署Prometheus+Grafana,来监控各个相关的指标数据来进行展示. 部署 Prometheus基于Golang编写(需要安装),编译后的软件包,不依赖于任何的第三方依赖.用户只需要下载对应平台的二进制包,解压并且添加基本的配置即可正常启动Prometheus Server. 环境: 系统: centos-release-7-4.1708.el7.ce

istio+prometheus+grafana 流量监控

从零到一配置 OpenAP: 概要        Istio采集指标prometheus+grafana方案,搭建(promethues+prometheus-postgresql-adapter+pg_prometheus) promethues·监控存储平台, 通过Istio+grafana 进行istio流量注入grafana图表展示,实现对API流量的统计. 配置 Prometheus 并且把数据存储至  Postgres (Prometheus + Postgres[Timescale

k8s+Prometheus+Grafana的监控部署

一.安装部署k8s集群 可以参考https://www.cnblogs.com/liugp/p/12115945.html 二.Prometheus+Grafana的监控部署 2.1.master/node节点环境部署 在[master]可以进行安装部署安装git,并下载相关yaml文件 https://gitee.com/liugpwwwroot/k8s-prometheus-grafana.git 在[node]节点下载监控所需镜像(非必须) docker pull prom/node-ex