运维监控平台之ganglia

1、ganglia简介

Ganglia 是一款为 HPC（高性能计算）集群而设计的可扩展的分布式监控系统，它可以

监视和显示集群中的节点的各种状态信息，它由运行在各个节点上的 gmond 守护进程来采

集 CPU 、内存、硬盘利用率、 I/O 负载、网络流量情况等方面的数据，然后汇总到 gmetad

守护进程下，使用 rrdtool 存储数据，最后将历史数据以曲线方式通过 PHP 页面呈现。

Ganglia 的特点如下：

良好的扩展性，分层架构设计能够适应大规模服务器集群的需要

负载开销低，支持高并发

广泛支持各种操作系统（ UNIX 等）和 cpu 架构，支持虚拟

2、ganglia组成

Ganglia 监控系统有三部分组成，分别是 gmond、 gmetad、 webfrontend，作用如下。

gmond: 即为 ganglia monitoring daemon，是一个守护进程，运行在每一个需要监测

的节点上，用于收集本节点的信息并发送到其他节点，同时也接收其他节点发过了

的数据，默认的监听端口为 8649。

gmetad: 即为 ganglia meta daemon，是一个守护进程，运行在一个数据汇聚节点上，

定期检查每个监测节点的 gmond 进程并从那里获取数据，然后将数据指标存储在

本地 RRD 存储引擎中。

webfrontend: 是一个基于 web 的图形化监控界面，需要和 Gmetad 安装在同一个节

点上，它从 gmetad 取数据，并且读取 RRD 数据库，通过 rrdtool 生成图表，用于

前台展示，界面美观、丰富，功能强大。下图是其结构

环境规划（centos6.7）

服务器端 172.16.80.117

客户端 172.16.80.117 172.16.80.116

3、ganglia的安装

[[email protected] tools]# wget wget 
[[email protected] tools]# rpm -ivh epel-release-6-8.noarch.rpm  
[[email protected] tools]# yum install ganglia-gmetad.x86_64  ganglia-gmond.x86_64 ganglia-gmond-python.x86_64  -y

修改服务端配置文件
[[email protected] tools]# vim /etc/ganglia/gmetad.conf 
data_source "my cluster"  172.16.80.117 172.16.80.116
gridname "MyGrid"

ganglia web的安装（基于LNMP环境）
[[email protected] tools]# tar xf ganglia-web-3.7.2.tar.gz 
[[email protected] tools]# mv ganglia-web-3.7.2 /application/nginx/html/ganglia

修改ganglia web的php配置文件
[[email protected] tools]# vim /application/nginx/html/ganglia/conf_default.php
$conf[‘gweb_confdir‘] = "/application/nginx/html/ganglia";

nginx配置
[[email protected] ganglia]# cat /application/nginx/conf/nginx.conf
worker_processes  2;
events {
    worker_connections  1024;
}
http {

log_format  main  ‘$remote_addr - $remote_user [$time_local] "$request" ‘
                      ‘$status $body_bytes_sent "$http_referer" ‘
                      ‘"$http_user_agent" "$http_x_forwarded_for"‘;

    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    keepalive_timeout  65;

    server {
        listen       80;
        server_name  www.martin.com martin.com;

        location / {
            root   html/zabbix;
            index  index.php index.html index.htm;
        }
        
         
        location ~ .*\.(php|php5)?$ {
            root  html/zabbix;
            fastcgi_pass 127.0.0.1:9000;
            fastcgi_index index.php;
            include fastcgi.conf;
               }

         access_log  logs/access_zabbix.log  main;        
   }

    server {
        listen       80;
        server_name  ganglia.martin.com;

        location / {
            root   html/ganglia;
            index  index.php index.html index.htm;
        }
      
             
        location ~ .*\.(php|php5)?$ {
            root   html/ganglia;
            fastcgi_pass 127.0.0.1:9000;
            fastcgi_index index.php;
            include fastcgi.conf;
               }

         access_log  logs/access_bbs.log  main;       

    }

###status
   server{
      listen 80;
      server_name status.martin.org;
      location / {
      stub_status on;
      access_log off;
        }
   }

}

访问测试，报错如下
Fatal error:Errors were detected in your configuration.
DWOO compiled templates directory ‘/application/nginx/html/ganglia/dwoo/compiled‘ is not writeable.
Please adjust $conf[‘dwoo_compiled_dir‘].
DWOO cache directory ‘/application/nginx/html/ganglia/dwoo/cache‘ is not writeable.
Please adjust $conf[‘dwoo_cache_dir‘].
in /application/nginx-1.6.3/html/ganglia/eval_conf.php on line 126

解决办法：
[[email protected] tools]# mkdir /application/nginx/html/ganglia/dwoo/compiled
[[email protected] tools]# mkdir /application/nginx/html/ganglia/dwoo/cache

[[email protected] tools]# chmod 777 /application/nginx/html/ganglia/dwoo/compiled
[[email protected] tools]# chmod 777 /application/nginx/html/ganglia/dwoo/cache
[[email protected] html]# chmod -R 777 /var/lib/ganglia/rrds

修改客户端配置文件（所有的客户端都需要做）
[[email protected] tools]# vim /etc/ganglia/gmond.conf 
cluster {
  name = "my cluster"    #这个名字要和服务器端定义的data_source后面的名字一样
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine‘s hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
#  mcast_join = 239.2.11.71
  host = 172.16.80.117      #这里我们采用单播方式，默认是组播
  port = 8649
#  ttl = 1
}

udp_recv_channel {
#  mcast_join = 239.2.11.71
  port = 8649
#  bind = 239.2.11.71
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  # buffer = 10485760
}

4、再次访问测试

这里是整个集群的一个总的汇总图，而不是单台服务器的图，下面我们打开单台服务器的图看看

再来看看对同一指标，每台服务器一起显示的图

5、扩展 Ganglia 监控功能的方法

默认安装完成的 Ganglia 仅向我们提供基础的系统监控信息，通过 Ganglia 插件可以实

现两种扩展 Ganglia 监控功能的方法。

1）添加带内（ in-band）插件，主要是通过 gmetric 命令来实现。

这是通常使用的一种方法，主要是通过 crontab 方法并调用 Ganglia 的 gmetric 命令来向

gmond 输入数据，进而实现统一监控。这种方法简单，对于少量的监控可以采用，但是对

于大规模自定义监控时，监控数据难以统一管理。

2）添加一些其他来源的带外（ out-of-band）插件，主要是通过 C 或者 Python 接口来

实现。

在 Ganglia3.1.x 版本以后，增加了 C 或 Python 接口，通过这个接口可以自定义数据收集

模块，并且可以将这些模块直接插入到 gmond 中以监控用户自定义的应用。

这里我们举例通过带外扩展的方式来监控nginx的运行状态

配置 ganglia 客户端，收集 nginx_status 数据
[[email protected] nginx_status]# pwd
/tools/gmond_python_modules-master/nginx_status
[[email protected] nginx_status]# cp conf.d/nginx_status.pyconf /etc/ganglia/conf.d/
[[email protected] nginx_status]# cp python_modules/nginx_status.py  /usr/lib64/ganglia/python_modules/
[[email protected] nginx_status]# cp graph.d/nginx_* /application/nginx/html/ganglia/graph.d/

[[email protected] mysql]# cat /etc/ganglia/conf.d/nginx_status.pyconf 
#

modules {
  module {
    name = ‘nginx_status‘
    language = ‘python‘

    param status_url {
      value = ‘http://status.martin.org/‘
    }
    param nginx_bin {
      value = ‘/application/nginx/sbin/nginx‘
    }
    param refresh_rate {
      value = ‘15‘
    }
  }
}

collection_group {
  collect_once = yes
  time_threshold = 20

  metric {
    name = ‘nginx_server_version‘
    title = "Nginx Version"
  }
}

collection_group {
  collect_every = 10
  time_threshold = 20

  metric {
    name = "nginx_active_connections"
    title = "Total Active Connections"
    value_threshold = 1.0
  }

  metric {
    name = "nginx_accepts"
    title = "Total Connections Accepted"
    value_threshold = 1.0
  }

  metric {
    name = "nginx_handled"
    title = "Total Connections Handled"
    value_threshold = 1.0
  }

  metric {
    name = "nginx_requests"
    title = "Total Requests"
    value_threshold = 1.0
  }

  metric {
    name = "nginx_reading"
    title = "Connections Reading"
    value_threshold = 1.0
  }

  metric {
    name = "nginx_writing"
    title = "Connections Writing"
    value_threshold = 1.0
  }

  metric {
    name = "nginx_waiting"
    title = "Connections Waiting"
    value_threshold = 1.0
  }
}

完成上面的所有步骤后，重启 Ganglia 客户端 gmond 服务，在客户端通过“ gmond–m”

命令可以查看支持的模板，最后就可以在 Ganglia web 界面查看 Nginx 的运行状态

时间： 2024-10-11 23:53:52

运维监控平台之ganglia的相关文章

企业运维监控平台架构设计与实现（ganglia篇）

一.Cacti/Nagios/Zabbix/centreon/Ganglia之抉择 1.cacti Cacti是一套基于PHP,MySQL,SNMP及RRDTool开发的网络流量监测图形分析工具. 简单的说Cacti 就是一个PHP 程序.它通过使用SNMP 协议获取远端网络设备和相关信息,(其实就是使用Net-SNMP 软件包的snmpget 和snmpwalk 命令获取)并通过RRDTOOL 工具绘图,通过PHP 程序展现出来.我们使用它可以展现出监控对象一段时间内的状态或者性能趋势图. 2

全新SaaS运维监控平台构建书

第一部分引言伴随的IT服务的发展,IT服务的概念也在发生着巨大的变化.IT运维服务已经由原来局限在用户自身环境下的IT服务,延伸到覆盖公用云.私有云.外包服务商等多纬度.全天候的SaaS运维模式, 从狭义理解,IT服务仅仅是为了解决信息系统出现的故障,在系统出现停顿的时候可以快速的恢复.而目前的IT服务已经包含了更多的内容,IT服务渗透在信息系统的整个生命周期之中.本文基于该理念,对IT服务系统的实现进行分析研究.文章基于网脊运维通SaaS(Software as aService)模式理念

江西畅行高速IT运维监控平台--PIGOSS BSM

案例所属行业:高速公路行业项目实施时间:2014年 1.1 项目背景江西畅行高速工程(以下简称"畅行高速")与高速公路周边系统的建设基于用户的消费账户支付系统和结算系统.既包括高速公路的收费,也包括高速公路周边的连锁超市的消费,互联网业务为江西畅行高速周边服务. 目前,江西畅行高速进行网络建设和核心生产平台应用系统的建设.随着江西畅行高速信息化应用的不断推广,核心生产平台的稳定运行对项目的影响越来越大.随着更多江西畅行高速业务系统上线运行和日常办公对业务系统的日益依

Zabbix运维监控平台快速搭建实录

一.Zabbix运行架构 Zabbix是一个企业级的分布式开源监控解决方案.它能够监控各种服务器的健康型.网络的稳定性以及各种应用系统的可靠性.当监控出现异常时,Zabbix通过灵活i的告警策咯,可以为任何事件配置基于邮件.短信.微信等告警机制.而这所有的一切,都可以通过Zabbix提供的Web界面进行配置和操作,基于Web的前端页面还提供了出色的报告和数据可视化功能.这些功能和特性使运维人员可以非常轻松的搭建一套功能强大的运维监控管理平台. Zabbix的运行架构图如下: 从图中可以看出Zab

【项目动态】PIGOSS BSM IT运维监控平台北京万兴建筑集团有限公司

案例所属行业:企业集团项目实施时间:2016年 1 项目背景北京万兴建筑集团有限公司成立于1985年,是一个以房建.市政.装饰.地产开发为四大支柱产业的大型综合性建筑企业集团.万兴集团注册资本金3.06亿元,现有总资产10多亿元,企业拥有一大批高素质专业技术管理人才,其中中高级职称专业技术人员300余人,国家一级.二级注册建造师200人.年开复工面积500万平方米左右,建安产值约50亿元左右.万兴集团还积极参与社会公益事业,累计捐款1000多万元. 为保障各项业务的稳定运行,需要对IT基础

运维监控平台之zabbix详解1

本来是想一篇文章介绍完的,写着写着发现篇幅太长,所以决定还是分两篇来介绍,本次软件所用的版本信息如下(基于LNMP环境),操作系统centos6.7 nginx-1.6.3.tar.gz php-5.6.24.tar.gz mysql-5.5.49-linux2.6-x86_64.tar.gz zabbix-3.0.4.tar.gz (zabbix-3以上版本之支持php5.4以上环境) LNMP环境搭建不再这里说明了,如果不懂的看本人之前写的nginx+php,这里简单介绍下php编译 [[e

运维监控平台之zabbix详解2

1.Zabbix架构 zabbix是一个基于WEB界面的提供分布式系统监视以及网络监视功能的企业级的开源解决方案.zabbix能监视各种网络参数,保证服务器系统的安全运营:并提供灵活的通知机制以让系统管理员快速定位/解决存在的各种问题. Zabbix的运行架构如下图所示: 2.组件 zabbix由以下几个组件部分构成: 1).Zabbix Server:负责接收agent发送的报告信息的核心组件,所有配置,统计数据及操作数据均由其组织进行: 2).Database Storage:专用于存储所有

无监控，不运维！运维监控工具平台建设总结

本文摘自微信公众号<高效运维> 运维行业有句话:"无监控.不运维",是的,一点也不夸张,监控俗称"第三只眼".没了监控,什么基础运维,业务运维都是"瞎子".所以说监控是运维这个职业的根本. 尤其是在现在DevOps这么火的时候,用监控数据给自己撑腰,这显得更加必要,有人说运维是背锅侠,那么,有了监控,有了充足的数据,一切以数据说话,运维还需要背锅吗,所以作为一个运维工程师,如何构建一套监控系统是你的第一件工作. 一.常见的运维监控工具

【IT运维监控】集团宕机引发对运维人员的思考　

前不久某大型集团官网和APP突然无法正常使用引发热议,不少人幸灾乐祸,也引发出了各种的谣言和段子,根本难以体会集团内部所受的压力,特别是作为一个大集团内部的运维人员所承受的各种压力和不安. 后来,原支付宝运维团队负责人针对此事发表了一篇文章,让不少的运维人员深有感触,作为肩负运维监控使命的运维监控工具--PIGOSS BSM 也同样感同身受.面对层出不穷的运维安全隐患,当下运维人员急需一套高效的7*24小时都能担负监控任务的工具,为自身的运维工作减负,告别之前加班熬夜但没有工作成绩的"怪现像