1、pt-heartbeat的作用
pt-heartbeat measures replication lag on a MySQL or PostgreSQL server. You can use it to update a master or monitor a replica. If possible, MySQL connection options are read from your .my.cnf file. For more details, please use the --help option, or try ‘perldoc /usr/bin/pt-heartbeat‘ for complete documentation.
pt-heartbeat is a two-part MySQL and PostgreSQL replication delay monitoring system that measures delay by looking at actual replicated data. This avoids reliance on the replication mechanism itself, which is unreliable. (For example, SHOW SLAVE STATUS on MySQL).
2、pt-heartbeat的原理
The first part is an --update instance of pt-heartbeat that connects to a master and updates a timestamp (“heartbeat record”) every --interval seconds. Since the heartbeat table may contain records from multiple masters (see “MULTI-SLAVE HIERARCHY”), the server’s ID (@@server_id) is used to identify records.
主库上存在一个用于检查延迟的表heartbeat,可手动或自动创建
pt-heartbeat使用--update参数连接到主库上并持续(根据设定的--interval参数)使用一个时间戳更新到表heartbeat
The second part is a --monitor or --check instance of pt-heartbeat that connects to a slave, examines the replicated heartbeat record from its immediate master or the specified --master-server-id, and computes the difference from the current system time. If replication between the slave and the master is delayed or broken, the computed difference will be greater than zero and otentially increase if --monitor is specified.
pt-heartbeat使用--monitor 或--check连接到从库,检查从主库同步过来的时间戳,并与当前系统时间戳进行比对产生一个差值,
该值则用于判断延迟。(注,前提条件是主库与从库应保持时间同步)
You must either manually create the heartbeat table on the master or use --create-table. See --create-table for the proper heartbeat table structure. The MEMORY storage engine is suggested, but not re-quired of course, for MySQL.
The heartbeat table must contain a heartbeat row. By default, a heartbeat row is inserted if it doesn’t exist. This feature can be disabled with the --[no]insert-heartbeat-row option in case the database user does not have INSERT privileges.
pt-heartbeat depends only on the heartbeat record being replicated to the slave, so it works regardless of the replication mechanism (built-in replication, a system such as Continuent Tungsten, etc). It works at any depth in the replication hierarchy; for example, it will reliably report how far a slave lags its master’s master’s master. And if replication is stopped, it will continue to work and report (accurately!) that the slave is falling further and further behind the master.
pt-heartbeat has a maximum resolution of 0.01 second. The clocks on the master and slave servers must be closely synchronized via NTP. By default, --update checks happen on the edge of the second (e.g. 00:01) and --monitor checks happen halfway between seconds (e.g. 00:01.5). As long as the servers’ clocks are closely synchronized and replication events are propagating in less than half a second, pt-heartbeat will report zero seconds of delay.
pt-heartbeat will try to reconnect if the connection has an error, but will not retry if it can’t get a connection when it first starts.
The --dbi-driver option lets you use pt-heartbeat to monitor PostgreSQL as well. It is reported to work well with Slony-1 replication.
3、获取pt-heartbeat帮助信息
a、获取帮助信息
[[email protected] ~]# pt-heartbeat #直接输入pt-heartbeat可获得一个简要描述,使用pt-heartbeat --help获得一个完整帮助信息
Usage: pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop
Errors in command-line arguments:
* Specify at least one of --stop, --update, --monitor or --check
* --database must be specified
4、演示使用pt-heartbeat
[python] view plain copy print?在CODE上查看代码片派生到我的代码片
a、首先添加表
[[email protected] ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --create-table --update
MASTER> select * from heartbeat;
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| ts | server_id | file | position | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| 2014-12-01T09:48:14.003020 | 11 | mysql-bin.000390 | 677136957 | mysql-bin.000179 | 120 |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
b、更新主库上的heartbeat
[[email protected] ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --update &
[1] 31249
c、从库上监控延迟
[[email protected] ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --monitor --print-master-server-id
1.00s [ 0.02s, 0.00s, 0.00s ] 11 #实时延迟,1分钟延迟,5分钟延迟,15分钟延迟
1.00s [ 0.03s, 0.01s, 0.00s ] 11
1.00s [ 0.05s, 0.01s, 0.00s ] 11
1.00s [ 0.07s, 0.01s, 0.00s ] 11
1.00s [ 0.08s, 0.02s, 0.01s ] 11
1.00s [ 0.10s, 0.02s, 0.01s ] 11
1.00s [ 0.12s, 0.02s, 0.01s ] 11
1.00s [ 0.13s, 0.03s, 0.01s ] 11
d、其他操作示例
#将主库上的update使用守护进程方式调度
[[email protected] ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --update --daemonize
#修改主库上的更新间隔为2s
[[email protected] ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --update --daemonize --interval=2
#停止主库上的pt-heartbeat守护进程
[[email protected] ~]# pt-heartbeat --stop
Successfully created file /tmp/pt-heartbeat-sentinel
[[email protected] ~]# rm -rf /tmp/pt-heartbeat-sentinel
#单次查看从库上的延迟情况
[[email protected] ~]$ pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --check
1.00
#使用守护进程监控从库并输出日志
[[email protected] ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test --master-server-id=11 --monitor --print-master-server-id --daemonize --log=/tmp/slave-lag.log
5、常用参数:
注意:需要指定的参数至少有 --stop,--update,--monitor,--check。其中--update,--monitor和--check是互斥的,--daemonize和--check也是互斥。
--ask-pass
隐式输入MySQL密码
--charset
字符集设置
--check
检查从的延迟,检查一次就退出,除非指定了--recurse会递归的检查所有的从服务器。
--check-read-only
如果从服务器开启了只读模式,该工具会跳过任何插入。
--create-table
在主上创建心跳监控的表,如果该表不存在。可以自己建立,建议存储引擎改成memory。通过更新该表知道主从延迟的差距。
CREATE TABLE heartbeat (
ts varchar(26) NOT NULL,
server_id int unsigned NOT NULL PRIMARY KEY,
file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS
position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS
relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS
exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS
);
heratbeat表一直在更改ts和position,而ts是我们检查复制延迟的关键。
--daemonize
执行时,放入到后台执行
--user
-u,连接数据库的帐号
--database
-D,连接数据库的名称
--host
-h,连接的数据库地址
--password
-p,连接数据库的密码
--port
-P,连接数据库的端口
--socket
-S,连接数据库的套接字文件
--file 【--file=output.txt】
打印--monitor最新的记录到指定的文件,很好的防止满屏幕都是数据的烦恼。
--frames 【--frames=1m,2m,3m】
在--monitor里输出的[]里的记录段,默认是1m,5m,15m。可以指定1个,如:--frames=1s,多个用逗号隔开。可用单位有秒(s)、分钟(m)、小时(h)、天(d)。
--interval
检查、更新的间隔时间。默认是见是1s。最小的单位是0.01s,最大精度为小数点后两位,因此0.015将调整至0.02。
--log
开启daemonized模式的所有日志将会被打印到制定的文件中。
--monitor
持续监控从的延迟情况。通过--interval指定的间隔时间,打印出从的延迟信息,通过--file则可以把这些信息打印到指定的文件。
--master-server-id
指定主的server_id,若没有指定则该工具会连到主上查找其server_id。
--print-master-server-id
在--monitor和--check 模式下,指定该参数则打印出主的server_id。
--recurse
多级复制的检查深度。模式M-S-S...不是最后的一个从都需要开启log_slave_updates,这样才能检查到。
--recursion-method
指定复制检查的方式,默认为processlist,hosts。
--update
更新主上的心跳表。
--replace
使用--replace代替--update模式更新心跳表里的时间字段,这样的好处是不用管表里是否有行。
--stop
停止运行该工具(--daemonize),在/tmp/目录下创建一个“pt-heartbeat-sentinel” 文件。后面想重新开启则需要把该临时文件删除,才能开启(--daemonize)。
--table
指定心跳表名,默认heartbeat。