Kubernetes相关组件监控指标采集

线上部署了kuberneter集群环境,需要在zabbix上对相关组件运行情况进行监控。kuberneter组件监控指标分为固定指标数据采集和动态指标数据采集。其中,固定指标数据在终端命令行可以通过metrics接口获取, 在zabbix里"自动发现";动态指标数据通过python脚本获获取,并返回JSON 字符串格式,在zabbix里添加模板或配置主机的自动发现策略。

一、固定指标数据采集(zabbix自动发现,采集间隔建议5min)

1. Master指标【采集范围:Master集群的3个节点,测试环境为192.168.10.93/94/95】

1、指标标识:kube_apiserver_process_cpu_seconds_total
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

2、指标标识:kube_apiserver_process_open_fds
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

3、指标标识:kube_apiserver_process_virtual_memory_bytes
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

4、指标标识:kube_apiserver_rest_client_requests_total_200_put
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘{print $2}‘

5、指标标识:kube_apiserver_rest_client_requests_total_200_get
采集指令示例:curl -s --cacert kubernetes-ca/ca.pem --cert kubernetes-ca/admin.pem --key kubernetes-ca/admin-key.pem  https://192.168.10.93:6443/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘{print $2}‘

6、指标标识:etcd_debugging_mvcc_db_total_size_in_bytes
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_debugging_mvcc_db_total_size_in_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

7、指标标识:etcd_server_has_leader
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_server_has_leader | grep -v ‘#‘ | awk ‘{print $2}‘

8、指标标识:etcd_server_leader_changes_seen_total
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_server_leader_changes_seen_total | grep -v ‘#‘ | awk ‘{print $2}‘

9、指标标识:etcd_server_proposals_failed_total
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep etcd_server_proposals_failed_total | grep -v ‘#‘ | awk ‘{print $2}‘

10、指标标识:etcd_process_cpu_seconds_total
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

11、指标标识:etcd_process_open_fds
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

12、指标标识:etcd_process_virtual_memory_bytes
采集指令示例:curl -s --cacert etcd/ca.pem --cert etcd/healthcheck-client.pem --key etcd/healthcheck-client-key.pem  https://192.168.10.93:2379/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

13、指标标识:kube_controller_manager_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

14、指标标识:kube_controller_manager_process_open_fds
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

15、指标标识:kube_controller_manager_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

16、指标标识:kube_controller_manager_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘{print $2}‘

17、指标标识:kube_controller_manager_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.93:10252/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘{print $2}‘

18、指标标识:kube_scheduler_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

19、指标标识:kube_scheduler_process_open_fds
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

20、指标标识:kube_scheduler_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

21、指标标识:kube_scheduler_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘{print $2}‘

22、指标标识:kube_scheduler_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.93:10251/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘{print $2}‘

2. Node指标【采集范围:Node的5个节点,测试环境为192.168.10.230/231/232/233/234】

1、指标标识:kubelet_docker_operations_errors_inspect_container
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep inspect_container | awk ‘{print $2}‘

2、指标标识:kubelet_docker_operations_errors_inspect_image
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep inspect_image | awk ‘{print $2}‘

3、指标标识:kubelet_docker_operations_errors_start_container
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep start_container | awk ‘{print $2}‘

4、指标标识:kubelet_docker_operations_errors_stop_container
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_docker_operations_errors | grep -v ‘#‘ | grep stop_container | awk ‘{print $2}‘

5、指标标识:kubelet_node_config_error
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep kubelet_node_config_error | grep -v ‘#‘ | awk ‘{print $2}‘

6、指标标识:kubelet_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

7、指标标识:kubelet_process_open_fds
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

8、指标标识:kubelet_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

9、指标标识:kubelet_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘{print $2}‘

10、指标标识:kubelet_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.230:10255/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘{print $2}‘

11、指标标识:kube_proxy_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

12、指标标识:kube_proxy_process_open_fds
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

13、指标标识:kube_proxy_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

14、指标标识:kube_proxy_rest_client_requests_total_200_put
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep PUT | grep 200 | awk ‘{print $2}‘

15、指标标识:kube_proxy_rest_client_requests_total_200_get
采集指令示例:curl -s 192.168.10.230:10249/metrics | grep rest_client_requests_total | grep -v ‘#‘ | grep GET | grep 200 | awk ‘{print $2}‘

3. 整体指标【采集Node集群中任一节点即可,测试环境可采集其中一台192.168.10.230即可。 在采集对应node节点的指标数据中,如果node节点宕机,则监控指标数据就会失败。为了防止这种情况,采集的IP可以建议修改为Nginx-Ingress IP或内部Service IP

1、指标标识:coredns_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:9153/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

2、指标标识:coredns_process_open_fds
采集指令示例:curl -s 192.168.10.230:9153/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

3、指标标识:coredns_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:9153/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

4、指标标识:kube_state_metrics_metrics_process_cpu_seconds_total
采集指令示例:curl -s 192.168.10.230:8081/metrics | grep process_cpu_seconds_total | grep -v ‘#‘ | awk ‘{print $2}‘

5、指标标识:kube_state_metrics_metrics_process_open_fds
采集指令示例:curl -s 192.168.10.230:8081/metrics | grep process_open_fds | grep -v ‘#‘ | awk ‘{print $2}‘

6、指标标识:kube_state_metrics_metrics_process_virtual_memory_bytes
采集指令示例:curl -s 192.168.10.230:8081/metrics | grep process_virtual_memory_bytes | grep -v ‘#‘ | awk ‘{print $2}‘

二、固定指标数据采集

动态指标采集的python脚本(将各个动态指标数据采集脚本整合到了一个脚本里)

[[email protected] ~]# cat zabbix-metrics-find.py
#!/usr/bin/env python
# coding:utf-8

import json
import os
import re
import sys

#kube-state-metrics自动发现for zabbix
#python传参value/values(不区分大小写)时显示监控值,其他参数或无参数显示监控KEY
#采集范围:任一Node节点,测试可在192.168.10.230,此IP后续建议改为Nginx-Ingress的负载IP,或内部service IP
#采集间隔建议5min
#Author: GaoKan
#Created: 2019-5-22
#Updated:
def main():
    ip = ‘192.168.10.230‘
    flag = ‘key‘
    if len(sys.argv) > 1:
        if sys.argv[1].lower() in (‘value‘, ‘values‘):
            flag = ‘value‘
    keys = []
    values = []
    metrics_dict = {
        #DaemonSet-Metrics
        ‘kube_daemonset_status_number_misscheduled‘ : {
            ‘forshort‘ : ‘ds_misscheduled‘,
            ‘tags‘ : [‘namespace‘, ‘daemonset‘,],
        },
        ‘kube_daemonset_status_number_unavailable‘ : {
            ‘forshort‘ : ‘ds_unavailable‘,
            ‘tags‘ : [‘namespace‘, ‘daemonset‘,],
        },
        #Deployment-Metrics
        ‘kube_deployment_status_replicas_unavailable‘ : {
            ‘forshort‘ : ‘deploy_unavailable‘,
            ‘tags‘ : [‘namespace‘, ‘deployment‘,],
        },
        #Pod-Metrics
        ‘kube_pod_container_status_waiting_reason‘ : {
            ‘forshort‘ : ‘po_cntr_waiting_reason‘,
            ‘tags‘ : [‘namespace‘, ‘pod‘, ‘container‘, ‘reason‘,],
        },
        ‘kube_pod_container_status_terminated_reason‘ : {
            ‘forshort‘ : ‘po_cntr_terminated_reason‘,
            ‘tags‘ : [‘namespace‘, ‘pod‘, ‘container‘, ‘reason‘,],
        },
        ‘kube_pod_container_status_restarts_total‘ : {
            ‘forshort‘ : ‘po_cntr_restarts_total‘,
            ‘tags‘ : [‘namespace‘, ‘pod‘, ‘container‘,],
        },
        #ReplicaSet-Metrics
        ‘kube_replicaset_status_ready_replicas‘ : {
            ‘forshort‘ : ‘rs_ready_replicas‘,
            ‘tags‘ : [‘namespace‘, ‘replicaset‘,],
        },
        ‘kube_replicaset_status_replicas‘ : {
            ‘forshort‘ : ‘rs_replicas‘,
            ‘tags‘ : [‘namespace‘, ‘replicaset‘,],
        },
        #Endpoint-Metrics
        ‘kube_endpoint_address_not_ready‘ : {
            ‘forshort‘ : ‘ep_not_ready‘,
            ‘tags‘ : [‘namespace‘, ‘endpoint‘,],
        },
    }
    metrics = os.popen(‘curl -s ‘ + ip + ‘:8080/metrics‘)
    for row in metrics:
        if row.startswith(‘#‘):
            continue
        pos1 = row.find(‘{‘)
        pos2 = row.find(‘}‘)
        if row[: pos1] in metrics_dict.keys():
            key = metrics_dict[row[: pos1]][‘forshort‘]
            for tag in metrics_dict[row[: pos1]][‘tags‘]:
                key += ‘_‘ + re.search(r‘‘ + tag + ‘=\"(.*?)\"‘, row[pos1 + 1 : pos2]).group(1)
            keys.append({"{#METRICSNAME}" : key})
            values.append({"{#METRICSVALUE}" : row[pos2 + 2 : -1]})
    if flag == ‘value‘:
        print(json.dumps({"data":values},indent = 4))
    else:
        print(json.dumps({"data":keys},indent = 4))

if __name__ == "__main__":
    main()

执行脚本,返回json字符串格式(执行结果显示的是kubernetes所有的对象资源,如pod,deploy,service等的运行状态,根据跑的业务量,可能会有成百上千个)

[[email protected] ~]# python zabbix-metrics-find.py |head -30
{
    "data": [
        {
            "{#METRICSNAME}": "ds_misscheduled_test-rg_test-rg-005"
        },
        {
            "{#METRICSNAME}": "ds_misscheduled_cattle-system_cattle-node-agent"
        },
        {
            "{#METRICSNAME}": "ds_misscheduled_test-rg_test-rg-001"
        },
        {
            "{#METRICSNAME}": "ds_misscheduled_test-rg_test-rg-002"
        },
        {
            "{#METRICSNAME}": "ds_misscheduled_test-rg_test-rg-003"
        },
        {
            "{#METRICSNAME}": "ds_misscheduled_test-rg_test-rg-004"
        },
        {
            "{#METRICSNAME}": "ds_unavailable_test-rg_test-rg-003"
        },
        {
            "{#METRICSNAME}": "ds_unavailable_test-rg_test-rg-004"
        },
        {
            "{#METRICSNAME}": "ds_unavailable_test-rg_test-rg-005"
        },
...................
...................
       {
            "{#METRICSNAME}": "po_cntr_restarts_total_test-rg_test-rg-005-jvkm6_test-rg-005"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_cattle-system_cattle-node-agent-mdl9x_agent"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_test-rg_test-rg-005-wpsbq_test-rg-005"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_test-rg_test-rg-004-9s57x_test-rg-004"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_test-rg_test-rg-005-wxk54_test-rg-005"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_cattle-system_cattle-node-agent-r46bz_agent"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_default_mysql-ceph-test-76697d98d6-4gj9v_mysql-ceph-test"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_kube-system_coredns-5cbf6655f-6wxqz_coredns"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_kube-system_kube-state-metrics-576fbb446d-ctl4p_addon-resizer"
        },
        {
            "{#METRICSNAME}": "po_cntr_restarts_total_kube-system_kube-state-metrics-576fbb446d-ctl4p_kube-state-metrics"
        },

...................
...................
        {
            "{#METRICSNAME}": "rs_ready_replicas_test_nginx-5c689d88bb"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_two-test_aicase-docker-5784b5749b"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_cattle-system_cattle-cluster-agent-d59dbdb55"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_test_nginx-589dcbcbd6"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_test_nginx-5b677cdf4f"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_default_mysql-ceph-test-76697d98d6"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_kube-system_kube-state-metrics-75bbc44548"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_kube-system_traefik-ingress-controller-6db4877748"
        },
        {
            "{#METRICSNAME}": "rs_ready_replicas_two-test_aicase-docker-57d445cbf"
        }
    ]
}

查询values

[[email protected] ~]# python zabbix-metrics-find.py values
{
    "data": [
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
.................
.................
        {
            "{#METRICSVALUE}": "1"
        },
        {
            "{#METRICSVALUE}": "27"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "3"
        },
        {
            "{#METRICSVALUE}": "0"
        },
.................
.................
        {
            "{#METRICSVALUE}": "1"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "2"
        },
        {
            "{#METRICSVALUE}": "1"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "0"
        },
        {
            "{#METRICSVALUE}": "2"
        },
        {
            "{#METRICSVALUE}": "0"
        }
    ]
}

原文地址:https://www.cnblogs.com/kevingrace/p/10917273.html

时间: 2024-08-29 02:14:11

Kubernetes相关组件监控指标采集的相关文章

Kubernetes集群监控方案

本文介绍在k8s集群中使用node-exporter.prometheus.grafana对集群进行监控.其实现原理有点类似ELK.EFK组合.node-exporter组件负责收集节点上的metrics监控数据,并将数据推送给prometheus, prometheus负责存储这些数据,grafana将这些数据通过网页以图形的形式展现给用户. 在开始之前有必要了解下Prometheus是什么?Prometheus (中文名:普罗米修斯)是由 SoundCloud 开发的开源监控报警系统和时序列

MySQL 监控指标

为了排查问题,对数据库的监控是必不可少的,在此介绍下 MySQL 中的常用监控指标. 简介 MySQL 有多个分支版本,常见的有 MySQL.Percona.MariaDB,各个版本所对应的监控项也会有些区别,在此仅介绍一些通用的监控项. 通常,监控项的源码是在 mysql/mysqld.cc 文件中定义,其内容如下所示. SHOW_VAR status_vars[]= { {"Aborted_clients", (char*) &aborted_threads, SHOW_L

常见的机器负载监控指标

概述 机器负载是否正常,经常需要监控的指标有如下4个: <1> cpu <2> memory <3> IO <4> network 关于cpu的监控 a. load average,cpu的负载 linux进程的状态分类可以粗略地分为 blocking process, runnable process,running process.分别为等待IO资源,或者自己调用了wait和sleep系列的函数被挂起的进程:所有资源都就位了,就等cpu的进程:正在cpu

阿里P9架构师谈:高并发网站的监控系统选型、比较、核心监控指标

在高并发分布式环境下,对于访问量大的业务.接口等,需要及时的监控网站的健康程度,防止网站出现访问缓慢,甚至在特殊情况出现应用服务器雪崩等场景,在高并发场景下网站无法正常访问的情况,这些就会涉及到分布式监控系统,对于核心指标提前监控,防患于未然. 常见的开源监控系统 1.Zabbix Zabbix是一个基于WEB界面的提供分布式系统监控以及网络监控功能的企业级开源运维平台,也是目前国内互联网用户中使用最广的监控软件. 入门容易.上手简单.功能强大并且开源免费. Zabbix易于管理和配置,能生成比

Metricbeat 轻量型指标采集器

一.介绍 用于从系统和服务收集指标.从 CPU 到内存,从 Redis 到 Nginx,Metricbeat 能够以一种轻量型的方式,输送各种系统和服务统计数据. 1.系统级监控,更简洁(轻量型指标采集器) 将 Metricbeat 部署到您所有的 Linux.Windows 和 Mac 主机,并将它连接到 Elasticsearch 就大功告成啦:您可以获取系统级的 CPU 使用率.内存.文件系统.磁盘 IO 和网络 IO 统计数据,以及获得如同系统上 top 命令类似的各个进程的统计数据.

【Oracle 集群】ORACLE DATABASE 11G RAC 知识图文详细教程之RAC 工作原理和相关组件(三)

RAC 工作原理和相关组件(三) 概述:写下本文档的初衷和动力,来源于上篇的<oracle基本操作手册>.oracle基本操作手册是作者研一假期对oracle基础知识学习的汇总.然后形成体系的总结,一则进行回顾复习,另则便于查询使用.本图文文档亦源于此.阅读Oracle RAC安装与使用教程前,笔者先对这篇文章整体构思和形成进行梳理.由于阅读者知识储备层次不同,我将从Oracle RAC安装前的准备与规划开始进行整体介绍安装部署Oracle RAC.始于唐博士指导,对数据库集群进行配置安装,前

apache kafka系列之-监控指标

apache kafka中国社区QQ群:162272557 1.监控目标 1.当系统可能或处于亚健康状态时及时提醒,预防故障发生 2.报警提示 a.短信方式 b.邮件 2.监控内容 2.1 机器监控 Kafkaserver指标 CPU Load Disk IO Memory 磁盘log.dirs文件夹下数据文件大小,要有定时清除策略 2.2 JVM监控 主要监控JAVA的 GC time(垃圾回收时间).JAVA的垃圾回收机制对性能的影响比較明显 2.3 Kafka系统监控 1.Kafka整体监

性能测试-监控指标数据分析

监控指标数据分析 1.最大并发用户数: 应用系统在当前环境(硬件环境.网络环境.软件环境(参数配置))下能承受的最大并发用户数. 在方案运行中,如果出现了大于3个用户的业务操作失败,或出现了服务器shutdown的情况,则说明在当前环境下,系统承受不了当前并发用户的负载压力,那么最大并发用户数就是前一个没有出现这种现象的并发用户数. 如果测得的最大并发用户数到达了性能要求,且各服务器资源情况良好,业务操作响应时间也达到了用户要求,那么OK.否则,再根据各服务器的资源情况和业务操作响应时间进一步分

K8S监控指标萌贝树母婴公司無坑骗

K8S监控指标 Kubernetes本身监控 ? Node资源利用率 :一般生产环境几十个node,几百个node去监控? Node数量 :一般能监控到node,就能监控到它的数量了,因为它是一个实例,一个node能跑多少个项目,也是需要去评估的,整体资源率在一个什么样的状态,什么样的值,所以需要根据项目,跑的资源利用率,还有值做一个评估的,比如再跑一个项目,需要多少资源. ? Pods数量(Node):其实也是一样的,每个node上都跑多少pod,不过默认一个node上能跑110个pod,但大