The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:
1. How best to monitor Streaming Replication?
2. What is the best way to do that?
3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?
4. How should I calculate replication lag-time, in seconds, minutes, etc.?
In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.
Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:
1. Disaster recovery
2. Streaming Replication is for High Availability
3. Load balancing, when using Streaming Replication with Hot Standby
PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:
1. pg_stat_replication view on master/primary server.
This view helps in monitoring the standby on Master. It gives you the following details:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
e.g.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.
e.g.
1 2 3 4 5 |
|
3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.
e.g.
1 2 3 4 5 |
|
4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:
1 2 3 4 5 |
|
5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:
1 2 3 4 5 |
|
Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.
So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”
If you have Hot Standby with Streaming Replication, the following are the points you should monitor:
1. Check if your Hot Standby is in recovery mode or not:
For this you can use pg_is_in_recovery() function.
2.Check whether Streaming Replication is working or not.
And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.
3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.
For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:
1 2 |
|
4. Check how far off is the Standby from Master.
There are two ways to monitor lag for Standby.
i. Lags in Bytes: For calculating lags in bytes, users can use thepg_stat_replication view on the master with the functionpg_xlog_location_diff function. Below is an example:
1 |
|
which gives the lag in bytes.
ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:
1 2 3 4 |
|
Including the above into your repertoire can give you good monitoring for PostgreSQL.
I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.