pg_statsinfo 2.5.0 usage

原地址:http://blog.163.com/[email protected]/blog/static/16387704020142585616183/

pg_statsinfo架构如下 :

pg_statsinfo作为被监控数据库的worker进程跟随数据库一起启动, 被监控数据库的状态信息定时自动或手动的上传到repo数据库. repo数据库可用和被监控机放一起, 也可以独立使用, 对于数据库比较多的环境, 建议使用独立的repo数据库.

pg_stats_reporter是一个报告插件, 联合pg_statsinfo一起使用, 后面会讲到.

pg_statsinfo的运行机制如下 :

一个启动了pg_statsinfo插件的数据库进程列表举例 :

pg93     17471     1  0 07:34 pts/1    00:00:05 /home/pg93/pgsql9.3.3/bin/postgres

pg93     17472 17471  0 07:34 pts/1    00:00:00 postgres: pg_statsinfo launcher process

pg93     17473 17471  0 07:34 ?        00:00:00 postgres: logger process

pg93     17475 17471  0 07:34 ?        00:00:25 postgres: checkpointer process

pg93     17476 17471  0 07:34 ?        00:00:24 postgres: writer process

pg93     17477 17471  3 07:34 ?        00:03:16 postgres: wal writer process

pg93     17478 17471  0 07:34 ?        00:00:04 postgres: autovacuum launcher process

pg93     17479 17471  0 07:34 ?        00:00:03 postgres: archiver process   last was 00000001000001D000000076

pg93     17480 17471  0 07:34 ?        00:00:16 postgres: stats collector process

pg93     17482 17472  0 07:34 pts/1    00:00:08 /home/pg93/pgsql9.3.3/bin/pg_statsinfod 17471

pg_statsinfo launcher 进程是postmaster进程fork而来, /home/pg93/pgsql9.3.3/bin/pg_statsinfod 17471 这个进程则是pg_statsinfo launcher 进程fork而来.

pg_statsinfo负责从数据库获取快照写入repo数据库, 同时过滤csvlog的信息(例如错误信息)到另一个文件或syslog, 这个过滤后的文件或syslog通常被其他监控软件用于监控数据库CSVLOG爆出的问题.

同时pg_statsinfo进程还可以接收启动或关闭agent的信号.

[[email protected]-172-16-3-150 ~]# /home/pg93/pgsql9.3.3/bin/pg_statsinfo --help

pg_statsinfo reports a PostgreSQL database.

Usage:

pg_statsinfo -r REPORTID [-i INSTANCEID] [-b SNAPID] [-e SNAPID] [-B DATE] [-E DATE]

[-o FILENAME] [connection-options]

pg_statsinfo -l          [-i INSTANCEID] [connection-options]

pg_statsinfo -s          [connection-options]

pg_statsinfo -S COMMENT  [connection-options]

pg_statsinfo -D SNAPID   [connection-options]

pg_statsinfo --start     [connection-options]

pg_statsinfo --stop      [connection-options]

General options:

-r, --report=REPORTID  generate a report that specified by REPORTID

---------------------------

* Summary

* DatabaseStatistics

* InstanceActivity

* OSResourceUsage

* DiskUsage

* LongTransactions

* NotableTables

* CheckpointActivity

* AutovacuumActivity

* QueryActivity

* LockConflicts

* ReplicationActivity

* SettingParameters

* SchemaInformation

* Profiles

* All

---------------------------

(can prefix match. For example, "su" means ‘Summary‘)

-i, --instid           limit to instances of specified instance ID

-b, --beginid          begin point of report scope (specify by snapshot ID)

-B, --begindate        begin point of report scope (specify by timestamp)

-e, --endid            end point of report scope (specify by snapshot ID)

-E, --enddate          end point of report scope (specify by timestamp)

-l, --list             show the snapshot list

-s, --size             show the snapshot size

-S, --snapshot=COMMENT get a snapshot

-D, --delete=SNAPID    delete a snapshot

--start                start the pg_statsinfo agent

--stop                 stop the pg_statsinfo agent

Output options:

-o, --output=FILENAME  destination file path for report

Connection options:

-d, --dbname=DBNAME       database to connect

-h, --host=HOSTNAME       database server host or socket directory

-p, --port=PORT           database server port

-U, --username=USERNAME   user name to connect as

-w, --no-password         never prompt for password

-W, --password            force password prompt

Generic options:

--echo                    echo queries

--elevel=LEVEL            set output message level

--help                    show this help, then exit

--version                 output version information, then exit

Read the website for details. <http://pgstatsinfo.projects.postgresql.org/>

Report bugs to <[email protected]>.

例如关闭agent, pg_statsinfod将被停掉

[[email protected]-172-16-3-150 ~]# /home/pg93/pgsql9.3.3/bin/pg_statsinfo --stop -h 127.0.0.1 -p 1921 -U postgres

LOG:  waiting for pg_statsinfod to shut down

LOG:  pg_statsinfod stopped

[[email protected] ~]# ps -ewf|grep "pg93 "

pg93     17471     1  0 07:34 pts/1    00:00:05 /home/pg93/pgsql9.3.3/bin/postgres

pg93     17472 17471  0 07:34 pts/1    00:00:00 postgres: pg_statsinfo launcher process

pg93     17473 17471  0 07:34 ?        00:00:00 postgres: logger process

pg93     17475 17471  0 07:34 ?        00:00:26 postgres: checkpointer process

pg93     17476 17471  0 07:34 ?        00:00:25 postgres: writer process

pg93     17477 17471  3 07:34 ?        00:03:27 postgres: wal writer process

pg93     17478 17471  0 07:34 ?        00:00:04 postgres: autovacuum launcher process

pg93     17479 17471  0 07:34 ?        00:00:03 postgres: archiver process   last was 00000001000001D100000023

pg93     17480 17471  0 07:34 ?        00:00:17 postgres: stats collector process

启动agent, 即将pg_statsinfod这个后台进程起来 :

[[email protected]-172-16-3-150 ~]# /home/pg93/pgsql9.3.3/bin/pg_statsinfo --start -h 127.0.0.1 -p 1921 -U postgres

LOG:  waiting for pg_statsinfod to start

LOG:  pg_statsinfod started

[[email protected] ~]# ps -ewf|grep "pg93 "

pg93      3031 17472  0 09:16 pts/1    00:00:00 /home/pg93/pgsql9.3.3/bin/pg_statsinfod 17471

接下来以如下环境为例, 说明pg_statsinfo 如何部署 :

被监控数据库版本 : PostgreSQL 9.3

repo数据库版本 : PostgreSQL 9.3

被监控数据库服务器操作系统 : CentOS 6.4 x64

被监控服务器和REPO服务器时区 +8

被监控数据库时区参数 : timezone=‘PRC‘, log_timezone=‘UTC‘

REPO数据库时区参数 : timezone=‘PRC‘

在被监控服务器, REPO服务器的环境中, 保持操作系统的时区, 数据库的timezone一致. 否则可能导致各统计表的采集时间不一致的问题.

步骤 :

1. 在被监控机下载pg_statsinfo, 并安装.

# wget http://pgfoundry.org/frs/download.php/3540/pg_statsinfo-2.5.0.tar.gz

# tar -zxvf pg_statsinfo-2.5.0.tar.gz

# cd pg_statsinfo-2.5.0

# export PATH=/home/pg93/pgsql/bin:$PATH

# which pg_config

/home/pg93/pgsql/bin/pg_config

# make USE_PGXS=1

# make USE_PGXS=1 install

2. 配置数据库的postgresql.conf.

su - pg93

cd $PGDATA

vi postgresql.conf

关键配置项 :

如果要监控SQL级别的统计信息, 需要打开pg_stat_statements插件.

shared_preload_libraries = ‘pg_stat_statements,pg_statsinfo‘            # (change requires restart)

timezone = ‘PRC‘

log_timezone = ‘UTC‘

log_destination = ‘csvlog‘              # Valid values are combinations of

logging_collector = on          # Enable capturing of stderr and csvlog

log_checkpoints = on

log_connections = on

log_disconnections = on

log_error_verbosity = verbose           # terse, default, or verbose messages

log_statement = ‘ddl‘                   # none, ddl, mod, all

track_activities = on

track_counts = on

track_functions = all                   # none, pl, all

log_min_messages = log          # values in order of decreasing detail:

log_truncate_on_rotation = on           # If on, an existing log file with the

pg_stat_statements.max = 1000

pg_stat_statements.track = all

pg_statsinfo.snapshot_interval = 10min          # set snapshot interval

pg_statsinfo.enable_maintenance = ‘on‘          # enable maintenance mode(‘on‘ or ‘off‘)

pg_statsinfo.maintenance_time = ‘00:02:00‘      # Delete old snapshots every day in this time.

pg_statsinfo.repository_keepday = 7             # keep old snapshots in this period in maintenance.

pg_statsinfo.syslog_min_messages = ‘disable‘

pg_statsinfo.textlog_line_prefix = ‘%t %p %c-%l %x %q(%u, %d, %r, %a) ‘  # This format is same as syslog‘s format.

pg_statsinfo.syslog_line_prefix = ‘%t %p %c-%l %x %q(%u, %d, %r, %a) ‘   # This format is same as syslog‘s format.

pg_statsinfo.long_lock_threashold = 30        # threthold parameter for getting LOCK infomation

pg_statsinfo.repository_server=‘host=172.16.3.39 port=5432 user=statsrepo dbname=statsrepo‘

如果不配置repo的连接密码, 使用.pgpass的方式验证, 那么pg_statsinfo.repository_server必须使用host, 不能使用hostaddr.

其他配置项 :

listen_addresses = ‘0.0.0.0‘            # what IP address(es) to listen on;

port = 1921                             # (change requires restart)

max_connections = 100                   # (change requires restart)

superuser_reserved_connections = 3      # (change requires restart)

unix_socket_directories = ‘.‘   # comma-separated list of directories

unix_socket_permissions = 0700          # begin with 0 to use octal notation

tcp_keepalives_idle = 60                # TCP_KEEPIDLE, in seconds;

tcp_keepalives_interval = 10            # TCP_KEEPINTVL, in seconds;

tcp_keepalives_count = 10               # TCP_KEEPCNT;

shared_buffers = 2048MB                 # min 128kB

maintenance_work_mem = 512MB            # min 1MB

vacuum_cost_delay = 10                  # 0-100 milliseconds

vacuum_cost_limit = 10000               # 1-10000 credits

bgwriter_delay = 10ms                   # 10-10000ms between rounds

wal_level = hot_standby                 # minimal, archive, or hot_standby

synchronous_commit = off                # synchronization level;

wal_buffers = 16MB                      # min 32kB, -1 sets based on shared_buffers

wal_writer_delay = 10ms         # 1-10000 milliseconds

checkpoint_segments = 128               # in logfile segments, min 1, 16MB each

archive_mode = on               # allows archiving to be done

archive_command = ‘/bin/date‘           # command to use to archive a logfile segment

max_wal_senders = 32            # max number of walsender processes

wal_keep_segments = 128         # in logfile segments, 16MB each; 0 disables

hot_standby = on                        # "on" allows queries during recovery

wal_receiver_status_interval = 1s       # send replies at least this often

hot_standby_feedback = on               # send info from standby to prevent

random_page_cost = 2.0                  # same scale as above

effective_cache_size = 96GB

track_activity_query_size = 1024        # (change requires restart)

autovacuum = on                 # Enable autovacuum subprocess?  ‘on‘

log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and

autovacuum_naptime = 3s         # time between autovacuum runs

autovacuum_vacuum_scale_factor = 0.0002 # fraction of table size before vacuum

autovacuum_analyze_scale_factor = 0.0001        # fraction of table size before analyze

datestyle = ‘iso, mdy‘

lc_messages = ‘C‘                       # locale for system error message

lc_monetary = ‘C‘                       # locale for monetary formatting

lc_numeric = ‘C‘                        # locale for number formatting

lc_time = ‘C‘                           # locale for time formatting

default_text_search_config = ‘pg_catalog.english‘

3. 配置启动数据库的OS用户环境变量. 确保默认可以直接使用超级用户连接, 无需密码.

例如 :

[email protected]-172-16-3-150-> psql

psql (9.3.3)

Type "help" for help.

digoal=# \q

环境变量 :

[email protected]-172-16-3-150-> cat .bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

export PGPORT=1921

export PGDATA=/ssd2/pg93/pg_root

export LANG=en_US.utf8

export PGHOME=/home/pg93/pgsql

export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH

export DATE=`date +"%Y%m%d%H%M"`

export PATH=$ORACLE_HOME/bin:$PGHOME/bin:$PATH:.

export MANPATH=$PGHOME/share/man:$MANPATH

export PGUSER=postgres

export PGHOST=$PGDATA

export PGDATABASE=digoal

4. 配置repo数据库, 如果repo数据库是独立的, 那么新建库和用户即可.

[email protected]-172-16-3-39-> psql

psql (9.3.2)

Type "help" for help.

postgres=# create role statsrepo nosuperuser login encrypted password ‘statsrepo‘;

postgres=# create database statsrepo with template template0 encoding ‘UTF8‘;

postgres=# grant all on database statsrepo to statsrepo;

配置repo数据库的pg_hba.conf, 允许被监控服务器访问statsrepo数据库.

host statsrepo statsrepo 0.0.0.0/0 md5

配置repo数据库的postgresql.conf, timezone=‘PRC‘. 保持与被监控库timezone一致.

配置repo数据库服务器的时区. /etc/sysconfig/clock, 保持与被监控服务器一致.

实际上在启动安装了pg_statsinfo插件的被监控库时, statsinfo进程会自动执行pg_statsrepo_partition.sql和pg_statsrepo_alert.sql

创建repo相关的表, 函数. 因此statsrepo的schema是自动初始化的, 不需要提前初始化.

[email protected]-172-16-3-150-> cd /home/pg93/pgsql/share/contrib/

[email protected]-172-16-3-150-> ll

total 180K

-rw-r--r-- 1 root root 5.4K Mar  4 17:21 pg_statsinfo.sql

-rw-r--r-- 1 root root  73K Mar  4 17:21 pg_statsrepo83.sql

-rw-r--r-- 1 root root  16K Mar  4 17:21 pg_statsrepo_alert.sql

-rw-r--r-- 1 root root  71K Mar  4 17:21 pg_statsrepo_partition.sql

-rw-r--r-- 1 root root  152 Mar  4 17:21 uninstall_pg_statsinfo.sql

-rw-r--r-- 1 root root  152 Mar  4 17:21 uninstall_pg_statsrepo.sql

同时在被监控库的本地postgres数据库里面会自动执行pg_statsinfo.sql脚本, 创建statsinfo schema, 以及一些表和函数.

例如snapshot是一个手工创建快照的函数, 还有维护快照的函数.

postgres=# \df statsinfo.*

List of functio

ns

Schema   |        Name        | Result data type |

Argument data types

|  Type

-----------+--------------------+------------------+--------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------+--------

statsinfo | activity           | record           | OUT idle double precision, OUT idle_in_xact double precision, OUT waiting doubl

e precision, OUT running double precision, OUT backends integer, OUT client text, OUT pid integer, OUT start timestamp with time zon

e, OUT duration double precision, OUT query text

| normal

statsinfo | array_agg          | anyarray         | anyelement

| agg

statsinfo | cpustats           | SETOF record     | OUT cpu_id text, OUT cpu_user bigint, OUT cpu_system bigint, OUT cpu_idle bigin

t, OUT cpu_iowait bigint, OUT overflow_user smallint, OUT overflow_system smallint, OUT overflow_idle smallint, OUT overflow_iowait

smallint

| normal

statsinfo | cpustats           | SETOF record     | prev_cpustats statsinfo.cpustats_type, OUT cpu_id text, OUT cpu_user bigint, OU

T cpu_system bigint, OUT cpu_idle bigint, OUT cpu_iowait bigint, OUT overflow_user smallint, OUT overflow_system smallint, OUT overf

low_idle smallint, OUT overflow_iowait smallint

| normal

statsinfo | devicestats        | SETOF record     | OUT device_major text, OUT device_minor text, OUT device_name text, OUT device_

readsector bigint, OUT device_readtime bigint, OUT device_writesector bigint, OUT device_writetime bigint, OUT device_ioqueue bigint

, OUT device_iototaltime bigint, OUT overflow_drs smallint, OUT overflow_drt smallint, OUT overflow_dws smallint, OUT overflow_dwt s

mallint, OUT overflow_dit smallint, OUT device_tblspaces name[]                                                | normal

statsinfo | devicestats        | SETOF record     | prev_devicestats statsinfo.devicestats_type[], OUT device_major text, OUT devic

e_minor text, OUT device_name text, OUT device_readsector bigint, OUT device_readtime bigint, OUT device_writesector bigint, OUT dev

ice_writetime bigint, OUT device_ioqueue bigint, OUT device_iototaltime bigint, OUT overflow_drs smallint, OUT overflow_drt smallint

, OUT overflow_dws smallint, OUT overflow_dwt smallint, OUT overflow_dit smallint, OUT device_tblspaces name[] | normal

statsinfo | last_xact_activity | SETOF record     | OUT pid integer, OUT xid xid, OUT in_xact boolean, OUT queries text

| normal

statsinfo | loadavg            | SETOF record     | OUT loadavg1 real, OUT loadavg5 real, OUT loadavg15 real

| normal

statsinfo | maintenance        | void             | repository_keep_period timestamp with time zone

| normal

statsinfo | memory             | SETOF record     | OUT memfree bigint, OUT buffers bigint, OUT cached bigint, OUT swap bigint, OUT

dirty bigint

| normal

statsinfo | profile            | SETOF record     | OUT processing text, OUT execute bigint, OUT total_exec_time double precision

| normal

statsinfo | sample             | void             |

| normal

statsinfo | snapshot           | void             |

| normal

statsinfo | snapshot           | void             | comment text

| normal

statsinfo | start              | void             | timeout integer

| normal

statsinfo | stop               | void             | timeout integer

| normal

statsinfo | tablespaces        | SETOF record     | OUT oid oid, OUT name text, OUT location text, OUT device text, OUT avail bigin

t, OUT total bigint, OUT spcoptions text[]

| normal

(17 rows)

5. 配置被监控机的.pgpass, 启动postgresql数据库的操作系统用户的HOME目录下.

su - pg93

vi ~/.pgpass

172.16.3.39:5432:statsrepo:statsrepo:statsrepo

chmod 400 ~/.pgpass

重启数据库, 如果在此前修改了log_timezone, 请把CSVLOG清除后重启.

cd $PGDATA/pg_log

rm -rf *

pg_ctl start

可以观察到自动初始化statsrepo的过程.

使用举例 :

在被监控库手工创建快照.

postgres=# select * from statsinfo.snapshot(‘test‘);

snapshot

----------

(1 row)

在repo查看快照收集到的信息 :

\c statsrepo statsrepo

select * from snapshot;

select * from checkpoint;

select * from autovacuum;

...

[参考]

1. http://pgstatsinfo.projects.pgfoundry.org/pg_statsinfo.html

2. http://pgstatsinfo.projects.pgfoundry.org/pg_stats_reporter.html

3. http://pgfoundry.org/frs/?group_id=1000422

4. http://blog.163.com/[email protected]/blog/static/16387704020142411454515/

5. http://blog.163.com/[email protected]/blog/static/1638770402014257448826/

6. http://blog.163.com/[email protected]/blog/static/16387704020142583013467/

时间: 2024-10-10 22:36:26

pg_statsinfo 2.5.0 usage的相关文章

Lua5.0 编译器入口

编译器相关的文主要是 luac.c . 看一下它的内容: int main(int argc, char* argv[]) {  lua_State* L;  Proto* f;  int i=doargs(argc,argv);  argc-=i; argv+=i;  if (argc<=0) usage("no input files given",NULL);  L=lua_open();  luaB_opentests(L);  for (i=0; i<argc; 

require.js读书笔记 2.usage

REQUIREJS API This is the RequireJS 2.0 API. If you want 1.0: Link to 1.0. Usage§§ 1-1.3 Load JavaScript Files  § 1.1 data-main Entry Point§ 1.2 Define a Module§ 1.3 Simple Name/Value Pairs§ 1.3.1 Definition Functions§ 1.3.2 Definition Functions with

NRPE Memory usage monitor script

#!/usr/bin/perl -w # Heavily based on the script from: # check_mem.pl Copyright (C) 2000 Dan Larsson <[email protected]> # heavily modified by # Justin Ellison <[email protected]> # # The MIT License (MIT) # Copyright (c) 2011 [email protected

Lua4.0 编译入口

解决上一篇的问题,上代码了. C 语言程序的入口为 main 函数,Lua 编译器的入口为 luac.c 文件里的 main 函数. 先来看一下 main 函数: int main(int argc, const char* argv[]) {  Proto** P,*tf;  int i=doargs(argc,argv);  argc-=i; argv+=i;  if (argc<=0) usage("no input files given",NULL);  L=lua_o

linux shell基础语法

1.第一个Shell脚本 打开文本编辑器,新建一个文件,扩展名为sh(sh代表shell),扩展名并不影响脚本执行,见名知意就好,如果你用php写shell 脚本,扩展名就用php好了. 输入一些代码: #!/bin/bash echo "Hello World !" "#!" 是一个约定的标记,它告诉系统这个脚本需要什么解释器来执行,即使用哪一种Shell.echo命令用于向窗口输出文本. 运行Shell脚本有两种方法. 1.1作为可执行程序 将上面的代码保存为t

Linux初学之函数

function -- 函数 把那些在脚本中重复出现并且没有任何改变的代码,封装起来,在适当的场景中调用执行: 程序员将这种被封装起来的代码称为功能体,或者叫模块: 在shell脚本编程中,函数是由若干条shell命令组成的语句块:通常用于代码重用和模块化封装: 函数里面的内容和shell程序形式上是一致的:不同之处就是,shell代码可以直接被执行:而函数中的内容,不能独立执行,只有被调用的时候才执行: 函数是在shell程序的当前shell中运行的: bash bash script_fil

自动安装MySQL脚本

在Linux 上安装MySQL单实例SHELL脚本 在CentOS 6.5环境测试通过 #!/bin/bash user=mysql group=mysql port=3306 basedir=/usr/local/mysql datadir=/data/mysql/mysql_${port}/data sourcefile=$1 mysqlprofile=/etc/my.cnf logfile=/tmp/mysqlinstall.log nowtime=`date '+%Y-%m-%d %H:

nagios-创建内存监控脚本及监控内存

客户端修改: 1.进入/usr/lib64/nagios/plugins目录下并更改脚本权限 cd /usr/lib64/nagios/plugins vim  check_memory #!/usr/bin/perl -w # $Id: check_memory 2 2002-02-28 06:42:51Z egalstad $ # Original script stolen from: # check_memory Copyright (C) 2000 Dan Larsson # hack

轻量的Memcached代理Twemproxy的部署

轻量的Memcached代理Twemproxy的部署 Twemproxy(又称为nutcracker)是一个轻量级的Redis和Memcached代理,主要用来减少对后端缓存服务器的连接数.由Twitter开源出来的缓存服务器集群管理工具,主要用来弥补Redis和Memcached对集群(cluster)管理指出的不足. Twemproxy是一个快速的单线程代理程序,支持Memcached ASCII协议和更新的Redis协议. Twemproxy最了不起的地方就在于它能在节点失败的时候卸载它,