MHA 自动切换步骤及过程剖析

MHA是众多使用MySQL数据库企业高可用的不二选择,它简单易用,功能强大,实现了基于MySQL replication架构的自动主从故障转移,本文主要描述了MHA自动切换的步骤,对切换过程做了演示以及进行了适当的分析,供大家参考和理解MHA以及MySQL的原理。

1、MHA自动切换的步骤
a、MHA manager启动时的校验阶段
   根据配置文件校验复制配置以及识别当前的master
   导致监控终止情形:复制配置异常,存在的异常slave,一些需要的脚本脚本异常
   MHA manager启动前可以通过masterha_check_ssh以及masterha_check_repl检测

b、监控master server阶段
   MHA manager启动成功后,输出日志[info] Ping(SELECT) succeeded, waiting until MySQL doesn‘t respond..
   MHA会定期ping master,缺省的间隔为1秒,直到master死掉
   MHA manager不会监控slave的状态,任意的Stopping/Restarting/Adding/Removing slaves不影响当前MHA监控
   当添加或移除slave节点时,建议修改配置文件以及重启MHA manager

c、侦测到master server异常阶段
   3次连续ping master失败,参数secondary_check_script可用于double check

d、开启failover阶段
   MHA再次读取配置文件(防止主从配置已更改),校验master以及当前masetr的从库
   MHA也会校验诸如复制异常以及是否存在一些从库有不同的主
   启动failover(排除上次failover失败或者failover时间间隔太短)

e.隔离master server阶段(可选)

如果配置文件定义了IP漂移脚本则此时会关闭master IP
   如果配置文件定义了关闭master脚本,则调用脚本关闭master以避免脑裂

f.恢复一个新的master
   从slave节点获取最新的end_log_pos (Read_Master_Log_Pos)位置
   从上一步获取的位置开始,读取及保存crashed master全部二进制日志文件(ssh可达)
   决定新的master(根据配置文件定义的优先级别)
   产生差量的binary/relay log events并且apply到新的master

g.激活一个新的master
   如果配置文件定义了IP漂移脚本则此时会将IP漂移到新master

i.恢复剩余的slave
   对所有从库并行地产生差量binary/relay log events
   并用apply差量binary/relay log events到各从库
   启动复制
 
j.通过切换结果(可选)
   邮件发送
   备份作业,管理工具调整等

2、MHA配置信息 
[[email protected] ~]$ more /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log

user=mha
password=xxx
ssh_user=root
repl_user=repl  
repl_password=repl  
ping_interval=1
shutdown_script=""
master_ip_online_change_script=""
report_script=""
master_ip_failover_script=/tmp/master_ip_failover
 
[server1]
hostname=vdbsrv1
master_binlog_dir=/data/mysqldata

[server2]
hostname=vdbsrv2
master_binlog_dir=/data/mysqldata

[server3]
hostname=vdbsrv3
master_binlog_dir=/data/mysqldata/
#candidate_master=1

[[email protected] ~]# more /etc/hosts
127.0.0.1    localhost.localdomain localhost
192.168.1.6  vdbsrv1  #master
192.168.1.7  vdbsrv2  #slave1
192.168.1.8  vdbsrv3  #slave2
192.168.1.12 vdbsrv4  #manager

mysql> show slave hosts;
+-----------+---------+------+-----------+--------------------------------------+
| Server_id | Host    | Port | Master_id | Slave_UUID                           |
+-----------+---------+------+-----------+--------------------------------------+
|      1001 | vdbsrv2 | 3306 |         1 | 75bef614-e342-11e4-921d-000c295fb2eb |
|      1002 | vdbsrv3 | 3306 |         1 | 091f79b8-e386-11e4-93d5-000c2943c830 |
+-----------+---------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

3、演示切换
###开启主库VIP
[[email protected] ~]# ssh vdbsrv1 "/sbin/ifconfig eth0:0 192.168.1.13 netmask 255.255.255.0 up"

###开启MHA manager
[[email protected] ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &

###使用sysbench产生事务
[[email protected] ~]# sysbench --test=oltp           \
> --oltp-table-size=5000000      \
> --oltp-read-only=off           \
> --init-rng=on                  \
> --num-threads=16               \
> --max-requests=0               \
> --oltp-dist-type=uniform       \
> --max-time=180                 \
> --mysql-user=root              \
> --mysql-socket=/tmp/mysql.sock \
> --mysql-password=‘‘            \
> --db-driver=mysql              \
> --mysql-table-engine=innodb    \
> --oltp-test-mode=complex       \
> --mysql-db=replicatedb run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

###从库vdbsrv2停止io进程以模拟io滞后
[[email protected] ~]# mysql -e ‘stop slave io_thread‘
[[email protected] ~]# mysql -e ‘show slave status\G‘ | egrep ‘Slave_IO|Slave_SQL‘
               Slave_IO_State:
             Slave_IO_Running: No           ###Author : Leshami
            Slave_SQL_Running: Yes          ###Blog   : http://blog.csdn.net/leshami
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

###2分钟后从库vdbsrv2启动io进程     
[[email protected] ~]# mysql -e ‘start slave io_thread‘

###从库vdbsrv3停止io进程以模拟io滞后
[[email protected] ~]# mysql -e ‘stop slave io_thread‘

###模拟主库宕机
[[email protected] ~]# ssh vdbsrv1 "killall -r mysqld"

###主库的sysbench被强制断开且伴随下列错误提示
[[email protected] ~]#
     .......
Doing OLTP test.
Running mixed OLTP test
Using Uniform distribution
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Threads started!
ALERT: failed to execute mysql_stmt_execute(): Err1317 Query execution was interrupted
FATAL: database error, exiting...
ALERT: failed to execute mysql_stmt_execute(): Err1317 Query execution was interrupted
FATAL: database error, exiting...

###查看VIP漂移情况
[[email protected] app1]# grep VIP /var/log/masterha/app1/manager.log
Disabling the VIP on old master: vdbsrv1
Enabling the VIP - 192.168.1.13/24 on the new master - vdbsrv2

###登陆到主机vdbsrv2 mysql
mysql> show variables like ‘%hostname%‘;
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| hostname      | vdbsrv2 |
+---------------+---------+
1 row in set (0.00 sec)

###此时vdbsrv2已提升为master,vdbsrv3指向了vdbsrv2
mysql> show slave hosts;
+-----------+---------+------+-----------+--------------------------------------+
| Server_id | Host    | Port | Master_id | Slave_UUID                           |
+-----------+---------+------+-----------+--------------------------------------+
|      1002 | vdbsrv3 | 3306 |      1001 | 091f79b8-e386-11e4-93d5-000c2943c830 |
+-----------+---------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)

4、MHA日志初步剖析

[[email protected] ~]# ls -hltr /var/log/masterha/*
total 32K
-rw-r--r-- 1 root root 143 Apr 20 14:30 saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog
-rw-r--r-- 1 root root 26K Apr 20 14:30 manager.log
-rw-r--r-- 1 root root   0 Apr 20 14:30 app1.failover.complete

###具体日志信息manager.log

Mon Apr 20 14:27:41 2015 - [info] MHA::MasterMonitor version 0.56.
Mon Apr 20 14:27:42 2015 - [info] GTID failover mode = 0
Mon Apr 20 14:27:42 2015 - [info] Dead Servers:
Mon Apr 20 14:27:42 2015 - [info] Alive Servers:
Mon Apr 20 14:27:42 2015 - [info]   vdbsrv1(192.168.1.6:3306)
Mon Apr 20 14:27:42 2015 - [info]   vdbsrv2(192.168.1.7:3306)
Mon Apr 20 14:27:42 2015 - [info]   vdbsrv3(192.168.1.8:3306)
Mon Apr 20 14:27:42 2015 - [info] Alive Slaves:
Mon Apr 20 14:27:42 2015 - [info]   vdbsrv2(192.168.1.7:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:27:42 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:27:42 2015 - [info]   vdbsrv3(192.168.1.8:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:27:42 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:27:42 2015 - [info] Current Alive Master: vdbsrv1(192.168.1.6:3306)
Mon Apr 20 14:27:42 2015 - [info] Checking slave configurations..
Mon Apr 20 14:27:42 2015 - [info]  read_only=1 is not set on slave vdbsrv2(192.168.1.7:3306).
Mon Apr 20 14:27:42 2015 - [info] Checking replication filtering settings..
Mon Apr 20 14:27:42 2015 - [info]  binlog_do_db= , binlog_ignore_db=
Mon Apr 20 14:27:42 2015 - [info]  Replication filtering check ok.
Mon Apr 20 14:27:42 2015 - [info] GTID (with auto-pos) is not supported
Mon Apr 20 14:27:42 2015 - [info] Starting SSH connection tests..
Mon Apr 20 14:27:43 2015 - [info] All SSH connection tests passed successfully.
Mon Apr 20 14:27:43 2015 - [info] Checking MHA Node version..
Mon Apr 20 14:27:44 2015 - [info]  Version check ok.
Mon Apr 20 14:27:44 2015 - [info] Checking SSH publickey authentication settings on the current master..
Mon Apr 20 14:27:44 2015 - [info] HealthCheck: SSH to vdbsrv1 is reachable.
Mon Apr 20 14:27:44 2015 - [info] Master MHA Node version is 0.56.
Mon Apr 20 14:27:44 2015 - [info] Checking recovery script configurations on vdbsrv1(192.168.1.6:3306)..
Mon Apr 20 14:27:44 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqldata
     --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000020
Mon Apr 20 14:27:44 2015 - [info]   Connecting to [email protected](vdbsrv1:22)..
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysqldata, up to mysql-bin.000020
Mon Apr 20 14:27:44 2015 - [info] Binlog setting check done.
Mon Apr 20 14:27:44 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Apr 20 14:27:44 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=vdbsrv2
 --slave_ip=192.168.1.7 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.22-log --manager_version=0.56
 --relay_log_info=/data/mysqldata/relay-log.info  --relay_dir=/data/mysqldata/  --slave_pass=xxx
Mon Apr 20 14:27:44 2015 - [info]   Connecting to [email protected](vdbsrv2:22)..
  Checking slave recovery environment settings..
    Opening /data/mysqldata/relay-log.info ... ok.
    Relay log found at /data/mysqldata, up to vdbsrv2-relay-bin.000002
    Temporary relay log file is /data/mysqldata/vdbsrv2-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Apr 20 14:27:45 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘
  --slave_host=vdbsrv3 --slave_ip=192.168.1.8 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.22-log
  --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info  --relay_dir=/data/mysqldata/  --slave_pass=xxx
Mon Apr 20 14:27:45 2015 - [info]   Connecting to [email protected](vdbsrv3:22)..
  Checking slave recovery environment settings..
    Opening /data/mysqldata/relay-log.info ... ok.
    Relay log found at /data/mysqldata, up to vdbsrv3-relay-bin.000002
    Temporary relay log file is /data/mysqldata/vdbsrv3-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Apr 20 14:27:45 2015 - [info] Slaves settings check done.
Mon Apr 20 14:27:45 2015 - [info]
vdbsrv1(192.168.1.6:3306) (current master)
 +--vdbsrv2(192.168.1.7:3306)
 +--vdbsrv3(192.168.1.8:3306)

Mon Apr 20 14:27:45 2015 - [info] Checking master_ip_failover_script status:
Mon Apr 20 14:27:45 2015 - [info]   /tmp/master_ip_failover --command=status --ssh_user=root --orig_master_host=vdbsrv1
  --orig_master_ip=192.168.1.6 --orig_master_port=3306

IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.13/24===

Checking the Status of the script.. OK
Mon Apr 20 14:27:45 2015 - [info]  OK.
Mon Apr 20 14:27:45 2015 - [warning] shutdown_script is not defined.
Mon Apr 20 14:27:45 2015 - [info] Set master ping interval 1 seconds.
Mon Apr 20 14:27:45 2015 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master
           reachability from two or more routes.
Mon Apr 20 14:27:45 2015 - [info] Starting ping health check on vdbsrv1(192.168.1.6:3306)..
Mon Apr 20 14:27:45 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn‘t respond..
Mon Apr 20 14:30:19 2015 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query) ###侦测到错误
Mon Apr 20 14:30:19 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqldata
 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Mon Apr 20 14:30:19 2015 - [info] HealthCheck: SSH to vdbsrv1 is reachable.
Mon Apr 20 14:30:20 2015 - [warning] Got error on MySQL connect: 2003 (Can‘t connect to MySQL server on ‘192.168.1.6‘ (111))
Mon Apr 20 14:30:20 2015 - [warning] Connection failed 2 time(s)..
Mon Apr 20 14:30:21 2015 - [warning] Got error on MySQL connect: 2003 (Can‘t connect to MySQL server on ‘192.168.1.6‘ (111))
Mon Apr 20 14:30:21 2015 - [warning] Connection failed 3 time(s)..
Mon Apr 20 14:30:22 2015 - [warning] Got error on MySQL connect: 2003 (Can‘t connect to MySQL server on ‘192.168.1.6‘ (111))
Mon Apr 20 14:30:22 2015 - [warning] Connection failed 4 time(s)..
Mon Apr 20 14:30:22 2015 - [warning] Master is not reachable from health checker!
Mon Apr 20 14:30:22 2015 - [warning] Master vdbsrv1(192.168.1.6:3306) is not reachable!
Mon Apr 20 14:30:22 2015 - [warning] SSH is reachable.
Mon Apr 20 14:30:22 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and
  /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Mon Apr 20 14:30:22 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Apr 20 14:30:22 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Apr 20 14:30:22 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Apr 20 14:30:22 2015 - [info] GTID failover mode = 0
Mon Apr 20 14:30:22 2015 - [info] Dead Servers:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv1(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info] Alive Servers:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv2(192.168.1.7:3306)
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv3(192.168.1.8:3306)
Mon Apr 20 14:30:22 2015 - [info] Alive Slaves:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv2(192.168.1.7:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:30:22 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv3(192.168.1.8:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:30:22 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info] Checking slave configurations..
Mon Apr 20 14:30:22 2015 - [info]  read_only=1 is not set on slave vdbsrv2(192.168.1.7:3306).
Mon Apr 20 14:30:22 2015 - [info] Checking replication filtering settings..
Mon Apr 20 14:30:22 2015 - [info]  Replication filtering check ok.
Mon Apr 20 14:30:22 2015 - [info] Master is down!
Mon Apr 20 14:30:22 2015 - [info] Terminating monitoring script.
Mon Apr 20 14:30:22 2015 - [info] Got exit code 20 (Master dead).
Mon Apr 20 14:30:22 2015 - [info] MHA::MasterFailover version 0.56.
Mon Apr 20 14:30:22 2015 - [info] Starting master failover.
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] GTID failover mode = 0
Mon Apr 20 14:30:22 2015 - [info] Dead Servers:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv1(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info] Checking master reachability via MySQL(double check)...
Mon Apr 20 14:30:22 2015 - [info]  ok.
Mon Apr 20 14:30:22 2015 - [info] Alive Servers:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv2(192.168.1.7:3306)
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv3(192.168.1.8:3306)
Mon Apr 20 14:30:22 2015 - [info] Alive Slaves:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv2(192.168.1.7:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:30:22 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv3(192.168.1.8:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:30:22 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info] Starting Non-GTID based failover.
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] * Phase 2: Dead Master Shutdown Phase.. ###隔离master server
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Apr 20 14:30:22 2015 - [info] Executing master IP deactivation script:
Mon Apr 20 14:30:22 2015 - [info]   /tmp/master_ip_failover --orig_master_host=vdbsrv1 --orig_master_ip=192.168.1.6 --orig_master_port=3306
  --command=stopssh --ssh_user=root 
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.13/24===

Disabling the VIP on old master: vdbsrv1
Mon Apr 20 14:30:22 2015 - [info]  done.
Mon Apr 20 14:30:22 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Apr 20 14:30:22 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] * Phase 3: Master Recovery Phase..
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000020:315514388
Mon Apr 20 14:30:22 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv2(192.168.1.7:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:30:22 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000020:313962621
Mon Apr 20 14:30:22 2015 - [info] Oldest slaves:
Mon Apr 20 14:30:22 2015 - [info]   vdbsrv3(192.168.1.8:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Mon Apr 20 14:30:22 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:22 2015 - [info] * Phase 3.2: Saving Dead Master‘s Binlog Phase..
Mon Apr 20 14:30:22 2015 - [info]
Mon Apr 20 14:30:23 2015 - [info] Fetching dead master‘s binary logs.. ###从dead master提取binlog
Mon Apr 20 14:30:23 2015 - [info] Executing command on the dead master vdbsrv1(192.168.1.6:3306): save_binary_logs --command=save
 --start_file=mysql-bin.000020  --start_pos=315514388 --binlog_dir=/data/mysqldata
 --output_file=/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog --handle_raw_binlog=1 --disable_log_bin=0
 --manager_version=0.56
  Creating /var/tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000020 pos 315514388 to mysql-bin.000020 EOF into
   /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /data/mysqldata/mysql-bin.000020 position 315514388 to tail(315514411).. ok.
 Binlog Checksum enabled
 Concat succeeded.
Mon Apr 20 14:30:23 2015 - [info] scp from [email protected]:/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog to
  local:/var/log/masterha/app1/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog succeeded. ###复制dump的binlog到监控节点
Mon Apr 20 14:30:23 2015 - [info] HealthCheck: SSH to vdbsrv2 is reachable.
Mon Apr 20 14:30:24 2015 - [info] HealthCheck: SSH to vdbsrv3 is reachable.
Mon Apr 20 14:30:24 2015 - [info]
Mon Apr 20 14:30:24 2015 - [info] * Phase 3.3: Determining New Master Phase..
Mon Apr 20 14:30:24 2015 - [info]
Mon Apr 20 14:30:24 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. ###寻找最近的slave中继日志
Mon Apr 20 14:30:24 2015 - [info] Checking whether vdbsrv2 has relay logs from the oldest position..             ###用于恢复其他slave
Mon Apr 20 14:30:24 2015 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000020 --latest_rmlp=315514388
  --target_mlf=mysql-bin.000020 --target_rmlp=313962621 --server_id=1001 --workdir=/var/tmp --timestamp=20150420143022
  --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info  --relay_dir=/data/mysqldata/ 
    Opening /data/mysqldata/relay-log.info ... ok.
    Relay log found at /data/mysqldata, up to vdbsrv2-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:vdbsrv2-relay-bin.000003, start_pos:6265486.
Target relay log FOUND!
Mon Apr 20 14:30:24 2015 - [info] OK. vdbsrv2 has all relay logs.
Mon Apr 20 14:30:24 2015 - [info] Searching new master from slaves..  ###寻找及决定新的master
Mon Apr 20 14:30:24 2015 - [info]  Candidate masters from the configuration file:
Mon Apr 20 14:30:24 2015 - [info]  Non-candidate masters:
Mon Apr 20 14:30:24 2015 - [info] New master is vdbsrv2(192.168.1.7:3306)
Mon Apr 20 14:30:24 2015 - [info] Starting master failover..          ###执行failover
Mon Apr 20 14:30:24 2015 - [info]
From:
vdbsrv1(192.168.1.6:3306) (current master)
 +--vdbsrv2(192.168.1.7:3306)
 +--vdbsrv3(192.168.1.8:3306)

To:
vdbsrv2(192.168.1.7:3306) (new master)
 +--vdbsrv3(192.168.1.8:3306)
Mon Apr 20 14:30:24 2015 - [info]
Mon Apr 20 14:30:24 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Mon Apr 20 14:30:24 2015 - [info]
Mon Apr 20 14:30:24 2015 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Apr 20 14:30:24 2015 - [info] Sending binlog..
Mon Apr 20 14:30:25 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog to
  [email protected]:/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog succeeded.
Mon Apr 20 14:30:25 2015 - [info]
Mon Apr 20 14:30:25 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Mon Apr 20 14:30:25 2015 - [info]
Mon Apr 20 14:30:25 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Apr 20 14:30:25 2015 - [info] Starting recovery on vdbsrv2(192.168.1.7:3306)..
Mon Apr 20 14:30:25 2015 - [info]  Generating diffs succeeded.
Mon Apr 20 14:30:25 2015 - [info] Waiting until all relay logs are applied.
Mon Apr 20 14:30:36 2015 - [info]  done.
Mon Apr 20 14:30:36 2015 - [info] Getting slave status..
Mon Apr 20 14:30:36 2015 - [info] This slave(vdbsrv2)‘s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000020:315514388).
  No need to recover from Exec_Master_Log_Pos.
Mon Apr 20 14:30:36 2015 - [info] Connecting to the target slave host vdbsrv2, running recover script..  ###基于新主库Apply差量日志
Mon Apr 20 14:30:36 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=‘mha‘ --slave_host=vdbsrv2
  --slave_ip=192.168.1.7  --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog
  --workdir=/var/tmp --target_version=5.6.22-log --timestamp=20150420143022 --handle_raw_binlog=1 --disable_log_bin=0
  --manager_version=0.56 --slave_pass=xxx
Mon Apr 20 14:30:36 2015 - [info]
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog on vdbsrv2:3306.
  This may take long time...
Applying log files succeeded.
Mon Apr 20 14:30:36 2015 - [info]  All relay logs were successfully applied.
Mon Apr 20 14:30:36 2015 - [info] Getting new master‘s binlog name and position..
Mon Apr 20 14:30:36 2015 - [info]  mysql-bin.000011:241926689
Mon Apr 20 14:30:36 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO
  MASTER_HOST=‘vdbsrv2 or 192.168.1.7‘, MASTER_PORT=3306, MASTER_LOG_FILE=‘mysql-bin.000011‘, MASTER_LOG_POS=241926689, MASTER_USER=‘repl‘,
  MASTER_PASSWORD=‘xxx‘;
Mon Apr 20 14:30:36 2015 - [info] Executing master IP activate script:  ###IP漂移
Mon Apr 20 14:30:36 2015 - [info]   /tmp/master_ip_failover --command=start --ssh_user=root --orig_master_host=vdbsrv1 --orig_master_ip=192.168.1.6
  --orig_master_port=3306 --new_master_host=vdbsrv2 --new_master_ip=192.168.1.7 --new_master_port=3306 --new_master_user=‘mha‘
  --new_master_password=‘xxx‘ 
Unknown option: new_master_user
Unknown option: new_master_password

IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.13/24===

Enabling the VIP - 192.168.1.13/24 on the new master - vdbsrv2
Mon Apr 20 14:30:36 2015 - [info]  OK.
Mon Apr 20 14:30:36 2015 - [info] ** Finished master recovery successfully.
Mon Apr 20 14:30:36 2015 - [info] * Phase 3: Master Recovery Phase completed.
Mon Apr 20 14:30:36 2015 - [info]
Mon Apr 20 14:30:36 2015 - [info] * Phase 4: Slaves Recovery Phase..
Mon Apr 20 14:30:36 2015 - [info]
Mon Apr 20 14:30:36 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Apr 20 14:30:36 2015 - [info]
Mon Apr 20 14:30:36 2015 - [info] -- Slave diff file generation on host vdbsrv3(192.168.1.8:3306) started, pid: 7726.
  Check tmp log /var/log/masterha/app1/vdbsrv3_3306_20150420143022.log if it takes time..
Mon Apr 20 14:30:37 2015 - [info]
Mon Apr 20 14:30:37 2015 - [info] Log messages from vdbsrv3 ...
Mon Apr 20 14:30:37 2015 - [info]
Mon Apr 20 14:30:36 2015 - [info] Server vdbsrv3 received relay logs up to: mysql-bin.000020:313962621
Mon Apr 20 14:30:36 2015 - [info] Need to get diffs from the latest slave(vdbsrv2) up to: mysql-bin.000020:315514388
  (using the latest slave‘s relay logs) ###此时vdbsrv3需要从vdbsrv2获取差量日志
Mon Apr 20 14:30:36 2015 - [info] Connecting to the latest slave host vdbsrv2, generating diff relay log files..
Mon Apr 20 14:30:36 2015 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.1.8
  --latest_mlf=mysql-bin.000020 --latest_rmlp=315514388 --target_mlf=mysql-bin.000020 --target_rmlp=313962621 --server_id=1001
  --diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog --workdir=/var/tmp --timestamp=20150420143022
  --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info --relay_dir=/data/mysqldata/
Mon Apr 20 14:30:37 2015 - [info]
    Opening /data/mysqldata/relay-log.info ... ok.
    Relay log found at /data/mysqldata, up to vdbsrv2-relay-bin.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:vdbsrv2-relay-bin.000003, start_pos:6265486.
 Concat binary/relay logs from vdbsrv2-relay-bin.000003 pos 6265486 to vdbsrv2-relay-bin.000003 EOF into
   /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog ..  ###注,此时是从vdbsrv2中继日志获取
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /data/mysqldata/vdbsrv2-relay-bin.000003 position 6265486 to tail(7817253).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog .
 scp vdbsrv2:/var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog to [email protected](22) succeeded.
Mon Apr 20 14:30:37 2015 - [info]  Generating diff files succeeded.
Mon Apr 20 14:30:37 2015 - [info] End of log messages from vdbsrv3.
Mon Apr 20 14:30:37 2015 - [info] -- Slave diff log generation on host vdbsrv3(192.168.1.8:3306) succeeded.
Mon Apr 20 14:30:37 2015 - [info] Generating relay diff files from the latest slave succeeded.
Mon Apr 20 14:30:37 2015 - [info]
Mon Apr 20 14:30:37 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Apr 20 14:30:37 2015 - [info]
Mon Apr 20 14:30:37 2015 - [info] -- Slave recovery on host vdbsrv3(192.168.1.8:3306) started, pid: 7735.
  Check tmp log /var/log/masterha/app1/vdbsrv3_3306_20150420143022.log if it takes time..
Mon Apr 20 14:30:43 2015 - [info]
Mon Apr 20 14:30:43 2015 - [info] Log messages from vdbsrv3 ...
Mon Apr 20 14:30:43 2015 - [info]
Mon Apr 20 14:30:37 2015 - [info] Sending binlog..  ###这部分则是在原master宕机后未发送的binlog部分
Mon Apr 20 14:30:38 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog to
  [email protected]:/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog succeeded.
Mon Apr 20 14:30:38 2015 - [info] Starting recovery on vdbsrv3(192.168.1.8:3306)..
Mon Apr 20 14:30:38 2015 - [info]  Generating diffs succeeded.
Mon Apr 20 14:30:38 2015 - [info] Waiting until all relay logs are applied.
Mon Apr 20 14:30:38 2015 - [info]  done.
Mon Apr 20 14:30:38 2015 - [info] Getting slave status..
Mon Apr 20 14:30:38 2015 - [info] This slave(vdbsrv3)‘s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000020:313962621).
  No need to recover from Exec_Master_Log_Pos.
Mon Apr 20 14:30:38 2015 - [info] Connecting to the target slave host vdbsrv3, running recover script..
Mon Apr 20 14:30:38 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=‘mha‘ --slave_host=vdbsrv3
  --slave_ip=192.168.1.8  --slave_port=3306 --apply_files=/var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog,
  /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog --workdir=/var/tmp --target_version=5.6.22-log
  --timestamp=20150420143022 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Mon Apr 20 14:30:43 2015 - [info]
 Concat all apply files to /var/tmp/total_binlog_for_vdbsrv3_3306.20150420143022.binlog ..
 Copying the first binlog file /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog to
   /var/tmp/total_binlog_for_vdbsrv3_3306.20150420143022.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from
  /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog..  Binlog Checksum enabled
dumped up to pos 120. ok.
 /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog position 120 to tail(143).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /var/tmp/total_binlog_for_vdbsrv3_3306.20150420143022.binlog .
Applying differential binary/relay log files /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog,
  /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog on vdbsrv3:3306. This may take long time...
Applying log files succeeded.
Mon Apr 20 14:30:43 2015 - [info]  All relay logs were successfully applied.
Mon Apr 20 14:30:43 2015 - [info]  Resetting slave vdbsrv3(192.168.1.8:3306) and starting replication from the new master vdbsrv2(192.168.1.7:3306)..
Mon Apr 20 14:30:43 2015 - [info]  Executed CHANGE MASTER.
Mon Apr 20 14:30:43 2015 - [info]  Slave started.
Mon Apr 20 14:30:43 2015 - [info] End of log messages from vdbsrv3.
Mon Apr 20 14:30:43 2015 - [info] -- Slave recovery on host vdbsrv3(192.168.1.8:3306) succeeded.
Mon Apr 20 14:30:43 2015 - [info] All new slave servers recovered successfully.
Mon Apr 20 14:30:43 2015 - [info]
Mon Apr 20 14:30:43 2015 - [info] * Phase 5: New master cleanup phase..
Mon Apr 20 14:30:43 2015 - [info]
Mon Apr 20 14:30:43 2015 - [info] Resetting slave info on the new master..
Mon Apr 20 14:30:43 2015 - [info]  vdbsrv2: Resetting slave info succeeded.
Mon Apr 20 14:30:43 2015 - [info] Master failover to vdbsrv2(192.168.1.7:3306) completed successfully.
Mon Apr 20 14:30:43 2015 - [info]

----- Failover Report -----

app1: MySQL Master failover vdbsrv1(192.168.1.6:3306) to vdbsrv2(192.168.1.7:3306) succeeded

Master vdbsrv1(192.168.1.6:3306) is down!   ###整个failover总时间14:30:43-14:30:19=24s

Check MHA Manager logs at vdbsrv4:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on vdbsrv1(192.168.1.6:3306)
The latest slave vdbsrv2(192.168.1.7:3306) has all relay logs for recovery.
Selected vdbsrv2(192.168.1.7:3306) as a new master.
vdbsrv2(192.168.1.7:3306): OK: Applying all logs succeeded.
vdbsrv2(192.168.1.7:3306): OK: Activated master IP address.
vdbsrv3(192.168.1.8:3306): Generating differential relay logs up to vdbsrv2(192.168.1.7:3306)succeeded.
Generating relay diff files from the latest slave succeeded.
vdbsrv3(192.168.1.8:3306): OK: Applying all logs succeeded. Slave started, replicating from vdbsrv2(192.168.1.7:3306)
vdbsrv2(192.168.1.7:3306): Resetting slave info succeeded.
Master failover to vdbsrv2(192.168.1.7:3306) completed successfully.

时间: 2024-10-16 22:57:55

MHA 自动切换步骤及过程剖析的相关文章

MHA在线切换过程

MySQL MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等.该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左右,大大减少了停机时间.本文演示了MHA 在线切换并给出了在线切换的基本步骤. 1.MHA在线切换方式及要求    $ masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=h

mysql mha 主从自动切换 高可用

mha(Master High Availability)目前在MySQL多服务器(超过二台),高可用方面是一个相对成熟的解决方案. 一,什么是mha,有什么特性 1. 主服务器的自动监控和故障转移 MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移.即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了.MHA通常在几秒内完成故障转移,9-12秒可以检测出

MHA自动Failover过程解析(updated) 转

允许转载, 转载时请以超链接形式标明文章原始出处和网站信息 http://www.mysqlsystems.com/2012/03/figure-out-process-of-autofailover-on-mha.html MHA自动Failover过程解析(updated) By zhang, on March 31st, 2012 MHA是一位日本MySQL大牛用Perl写的一套MySQL故障切换方案,来保证数据库系统的高可用.近期,在田老师的推动下,开始一步步深入了解这个HA方案,并也计

可以自动切换的tab选项卡实现过程详解

可以自动切换的tab选项卡实现过程详解:关于选项卡大家一定不会陌生,应用非常的频繁,通常选项卡都是需要点击或者划过才能够实现切换.本章节分享一个能够实现自动切换的选项卡功能,并给出它的具体实现过程.代码实例如下: <!DOCTYPE html> <html> <head> <meta charset=" utf-8"> <meta name="author" content="http://www.so

MySQL高可用方案MHA自动Failover与手动Failover的实践及原理

集群信息 角色                             IP地址                 ServerID      类型 Master                         192.168.244.10   1                 写入 Candicate master          192.168.244.20   2                 读 Slave                           192.168.244.

Redis主从、sentinel故障自动切换

一.什么是redis主从复制? 主从复制,当用户往Master端写入数据时,通过Redis Sync机制将数据文件发送至Slave,Slave也会执行相同的操作确保数据一致:且实现Redis的主从复制非常简单. 二.redis主从复制特点 1.同一个Master可以拥有多个Slaves. 2.Master下的Slave还可以接受同一架构中其它slave的链接与同步请求,实现数据的级联复制,即Master->Slave->Slave模式: 3.Master以非阻塞的方式同步数据至slave,这将

高仿优酷Android客户端图片左右滑动(自动切换)

本例是用ViewPager去做的实现,支持自动滑动和手动滑动,不仅优酷网,实际上有很多商城和门户网站都有类似的实现: 具体思路: 1. 工程中需要添加android-support-v4.jar,才能使用ViewPager控件. 2. 图片的自动切换: 可使用Timer或者ScheduledExecutorService,这个有多重方式可以实现. 同时要切换底部的dots(园点) 3.Handler+Message机制更新UI,这个相信大家都很熟练,不再描述 4. 实现的一些细节:注意本例中的优

新版keeplived + MySQL双主自动切换

### 新版keeplived + MySQL双主自动切换 ###参考网址: http://biancheng.dnbcw.info/mysql/381020.html###一.网络结构  ###    VIP 10.125.141.249    mysq1m1 10.125.141.138 主1(相对而言是主库)    mysqlm2 192.168.88.131 主2(相对而言是备库)### 二.主要步骤 ###    一 mysql 主主同步    二 安装keepalived     三

搭建自动切换的HA

2.搭建自动切换的ha(比手工切换ha集群,多了zookeeper集群) 搭建ZK集群环境,启动zk集群,格式化ZK     ----------------------------------------------------------------------------------------------- 2. 环境   zookeeper:hadoop2,hadoop3,hadoop4 namenode:hadoop0和hadoop1 datanode:hadoop2,hadoop3