MHA介绍:
MHA是一套MySQL故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性。MHA部署简单,也需要额外的服务器开销,运行MHA时对数据服务器性能几乎没有影响,也不需要对现有架构做调整。
同时MHA还支持主库在线切换,能够安全地将现在的主库切到新的主库,只会对写操作有0.5~2s的阻塞,对读没有影响。
MHA有以下功能,对有高可用,数据一致性,主库不停机维护要求的场景都很实用。
1.主服务器的自动监控和故障转移
MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移。即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了。MHA通常在几秒内完成故障转移,9-12秒可以检测出主服务器故障,7-10秒内关闭故障的主服务器以避免脑裂,几秒中内应用差异的relay log到新的主服务器上,整个过程可以在10-30s内完成。还可以设置优先级指定其中的一台slave作为master的候选人。由于MHA在slaves之间修复一致性,因此可以将任何slave变成新的master,而不会发生一致性的问题,从而导致复制失败。
2.交互式主服务器故障转移
可以只使用MHA的故障转移,而不用于监控主服务器,当主服务器故障时,人工调用MHA来进行故障故障。
3.非交互式的主库故障转移
不监控主服务器,但自动实现故障转移。这种特征适用于已经使用其他软件来监控主服务器状态,比如heartbeat来检测主服务器故障和虚拟IP地址接管,可以使用MHA来实现故障转移和slave服务器晋级为master服务器。
4.在线切换主服务器
在许多情况下,需要将现有的主服务器迁移到另外一台服务器上。比如主服务器硬件故障,RAID控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降,导致停机时间至少无法写入数据。另外,阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。MHA提供快速切换和优雅的阻塞写入,这个切换过程只需要0.5-2s的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。
实验环境:
OS: # cat /etc/issue CentOS release 5.11 (Final) Kernel \r on an \m mysql: select @@version; +------------+ | @@version | +------------+ | 5.5.33-log | +------------+ mha: mha4mysql-manager-0.56.tar.gz mha4mysql-node-0.56.tar.gz
实验拓扑:
master : 192.168.6.85
slave1: 192.168.6.91(candidate)
slave2: 192.168.6.149
manager: 192.168.6.149
由于机器有限,所以把manager部署在slave2上
实验步骤:
1,安装MHA
在manager上安装manager节点和node节点,在master和slave1,slave2安装node节点,要先安装node结点,再安装manager结点,不然会报错。
(1)在master,slave1和slave2上面均安装node节点
a,安装依赖:
# yum install perl-DBD-MySQL -y
b,编译安装node节点
# tar -xvf mha4mysql-node-0.56.tar.gz # cd mha4mysql-node-0.56 # perl Makefile.PL # make && make install
(2)在slave2安装manager节点
a,安装依赖
# yum install perl-DBD-MySQL -y # yum install perl-Config-Tiny -y # yum installperl-Log-Dispatch -y # yum install perl-Parallel-ForkManager -y
b, 编译安装node结点
# tar -xvf mha4mysql-manager-0.56.tar.gz # cd mha4mysql-manager-0.56 # perl Makefile.PL # make && make install
2,配置ssh免密码登陆
三台主机做互信,以master: 192.168.6.85为例:
在192.168.6.85执行:
# ssh-keygen -t rsa -P ‘‘ # ssh-copy-id-i /root/.ssh/id_rsa.pub 192.168.6.91 # ssh-copy-id-i /root/.ssh/id_rsa.pub 192.168.6.149 # ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.6.85
slave1和slave2操作类似
3,配置mha manager的配置文件
(1),创建mha目录,创建配置文件masterha_default.cnf
# mkdir /etc/mha # cd /etc/mha # cat masterha_default.cnf [server default] user=root // mysql管理账户 password=123456 // user对应的mysql账户密码 ssh_user=root //默认是当前登陆manager的OS的用户,需要拥有读取mysqlbinlog和relay log的权限 master_binlog_dir=/data/dbdata/mysqllog/binlog //mysql的binlog路径 remote_workdir=/tmp // node节点产生log文件的目录 ping_interval=1 //manager执行ping mysql的时间间隔,默认为3s repl_user=repl //复制账户 repl_password=repl /复制账户密码 master_ip_online_change_script=/usr/bin/master_ip_online_change report_script=/usr/bin/send_report [server1] hostname=192.168.6.85 port=3306 [server2] hostname=192.168.6.91 port=3306 candidate_master=1 [server3] hostname=192.168.6.149 port=3306
4,测试ssh无密码登陆是否配置正确:
# masterha_check_ssh --conf=/etc/mha/masterha_default.cnf
如果配置正确会出现AllSSH connection tests passed successfully.
5,测试主从复制是否配置正确
# masterha_check_repl --conf=/etc/mha/masterha_default.cnf
如果配置正确会出现MySQLReplication Health is OK.
6,设置relay log的清除方式
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relaylog的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)
在slave1和slave2上关闭自动清除relaylog.
mysql -uroot -p123456 -e ‘set global relay_log_purge=0‘
MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。
pure_relay_logs脚本参数如下所示:
--usermysql 用户名
--passwordmysql 密码
--port 端口号
--workdir 指定创建relay log的硬链接的位置,默认是/var/tmp,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge 默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。
在slave1和slave2上设置定期清理relay脚本:
# cat purge_relay_log.sh #!/bin/bash user=root passwd=123456 port=3306 log_dir=‘/var/tmp/log‘ work_dir=‘/var/tmp‘ purge=‘/usr/bin/purge_relay_logs‘ if [ ! -d $log_dir ] then mkdir $log_dir-p fi $purge --user=$user --password=$passwd--disable_relay_log_purge --port=$port --workdir=$work_dir >>$log_dir/purge_relay_logs.log 2>&1
将脚本添加到crontab定期执行:
# crontab -l
04 * * * /bin/bash /root/purge_relay_log.sh
7,配置发邮件脚本
# cat /usr/bin/send_report #!/usr/bin/perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # itunder the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is notcomplete. Modify the script based on your environment. use strict; use warnings FATAL => ‘all‘; use Mail::Sender; use Getopt::Long; #new_master_host and new_slave_hosts areset only when recovering master succeeded my ( $dead_master_host, $new_master_host,$new_slave_hosts, $subject, $body ); my $smtp=‘smtp.163.com‘; my $mail_from=‘[email protected]‘; my $mail_user=‘[email protected]‘; my $mail_pass=‘163.com‘; my $mail_to=‘[email protected]‘; GetOptions( ‘orig_master_host=s‘ => \$dead_master_host, ‘new_master_host=s‘ =>\$new_master_host, ‘new_slave_hosts=s‘ =>\$new_slave_hosts, ‘subject=s‘ =>\$subject, ‘body=s‘ => \$body, ); # Do whatever you want here mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body); sub mailToContacts { my($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_; openmy $DEBUG, ">/var/log/masterha/app1/mail.log" ordie "Can‘t open the debug file:$!\n"; my$sender = new Mail::Sender { ctype =>‘text/plain;charset=utf-8‘, encoding => ‘utf-8‘, smtp => $smtp, from => $mail_from, auth => ‘LOGIN‘, TLS_allowed => ‘0‘, authid => $mail_user, authpwd => $mail_pass, to => $mail_to, subject => $subject, debug => $DEBUG }; $sender->MailMsg( { msg=> $msg, debug=> $DEBUG } )or print $Mail::Sender::Error; return1; } exit 0;
8,开启mha manager监控
# nohup masterha_manager--conf=/etc/mha/masterha_default.cnf --remove_dead_master_conf --ignore_last_failover > /tmp/mha_manager.log< /dev/null 2>&1 &
启动参数介绍:
--remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
--manger_log 日志存放位置
--ignore_last_failover 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录(默认/var/tmp/)产生masterha_default.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
9,mha manager状态检测
# masterha_check_status --conf=/etc/mha/masterha_default.cnf
masterha_default (pid:28907) isrunning(0:PING_OK), master:192.168.6.85
10,测试是否进行主从切换
(1)关掉192.168.6.85上面的mysql服务,测试mysql服务挂掉的场景
mysqladmin shutdown -uroot -p123456
详细日志见附录一。
(2)关掉主库的网卡,测试网络不通的场景
ifconfigeth0 down
详细日志见附录二。
tail-f /tmp/mha_manager.log查看masterha_manager的工作日志。最后看到Master failover to 192.168.6.91(192.168.6.91:3306) completedsuccessfully 即表示主库切换成功。
-----Failover Report ----- masterha_default:MySQL Master failover 192.168.6.85(192.168.6.85:3306) to192.168.6.91(192.168.6.91:3306) succeeded Master192.168.6.85(192.168.6.85:3306) is down! CheckMHA Manager logs at slave2 for details. Startedautomated(non-interactive) failover. Thelatest slave 192.168.6.91(192.168.6.91:3306) has all relay logs for recovery. Selected192.168.6.91(192.168.6.91:3306) as a new master. 192.168.6.91(192.168.6.91:3306):OK: Applying all logs succeeded. 192.168.6.149(192.168.6.149:3306):This host has the latest relay log events. Generatingrelay diff files from the latest slave succeeded. 192.168.6.149(192.168.6.149:3306):OK: Applying all logs succeeded. Slave started, replicating from 192.168.6.91(192.168.6.91:3306) 192.168.6.91(192.168.6.91:3306):Resetting slave info succeeded. Masterfailover to 192.168.6.91(192.168.6.91:3306) completed successfully.
对比附录一和附录二的日志信息可以看到如果manager不能ssh到宕掉的服务器的话就不能把日志发保存并发给从库,从库得不到差异日志可能会丢失一部分数据。
切换完成后可以看到邮件告警:
11,在线切换
先停掉MHA监控:
masterha_stop --conf=/etc/mha/masterha_default.cnf
修改脚本master_ip_online_change,注释如下代码:
#FIXME_xxx_drop_app_user($orig_master_handler);
#FIXME_xxx_create_app_user($new_master_handler);
执行在线切换命令
masterha_master_switch --conf=/etc/mha/masterha_default.cnf --master_state=alive --new_master_host=192.168.6.91 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
最后在线切换成功:
Fri Jan 22 10:36:22 2016 - [info] All newslave servers switched successfully. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] * Phase5: New master cleanup phase.. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] 192.168.6.91: Resetting slave info succeeded. Fri Jan 22 10:36:22 2016 - [info] Switchingmaster to 192.168.6.91(192.168.6.91:3306) completed successfully
详细切换日志见附录三
附录一:
Mon Jan 18 17:28:36 2016 - [warning] Gottimeout on MySQL Ping(SELECT) child process and killed it! at/usr/lib/perl5/site_perl/5.8.8/MHA/HealthCheck.pm line 431. Mon Jan 18 17:28:36 2016 - [info] ExecutingSSH check script: save_binary_logs --command=test --start_pos=4--binlog_dir=/data/dbdata/mysqllog/binlog--output_file=/tmp/save_binary_logs_test --manager_version=0.56--binlog_prefix=binlog Mon Jan 18 17:28:37 2016 - [warning] Goterror on MySQL connect: 2003 (Can‘t connect to MySQL server on ‘192.168.6.85‘(4)) Mon Jan 18 17:28:37 2016 - [warning]Connection failed 2 time(s).. Mon Jan 18 17:28:38 2016 - [warning] Goterror on MySQL connect: 2003 (Can‘t connect to MySQL server on ‘192.168.6.85‘(4)) Mon Jan 18 17:28:38 2016 - [warning]Connection failed 3 time(s).. Mon Jan 18 17:28:40 2016 - [warning] Goterror on MySQL connect: 2003 (Can‘t connect to MySQL server on ‘192.168.6.85‘(4)) Mon Jan 18 17:28:40 2016 - [warning]Connection failed 4 time(s).. Mon Jan 18 17:28:42 2016 - [warning] HealthCheck:Got timeout on checking SSH connection to 192.168.6.85! at/usr/lib/perl5/site_perl/5.8.8/MHA/HealthCheck.pm line 342. Mon Jan 18 17:28:42 2016 - [warning] Masteris not reachable from health checker! Mon Jan 18 17:28:42 2016 - [warning] Master192.168.6.85(192.168.6.85:3306) is not reachable! Mon Jan 18 17:28:42 2016 - [warning] SSH isNOT reachable. Mon Jan 18 17:28:42 2016 - [info]Connecting to a master server failed. Reading configuration file/etc/masterha_default.cnf and /etc/mha/masterha_default.cnf again, and tryingto connect to all servers to check server status.. Mon Jan 18 17:28:42 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping. Mon Jan 18 17:28:42 2016 - [info] Readingapplication default configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 17:28:42 2016 - [info] Readingserver configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 17:28:42 2016 - [info] GTIDfailover mode = 0 Mon Jan 18 17:28:42 2016 - [info] DeadServers: Mon Jan 18 17:28:42 2016 - [info] 192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:42 2016 - [info] AliveServers: Mon Jan 18 17:28:42 2016 - [info] 192.168.6.91(192.168.6.91:3306) Mon Jan 18 17:28:42 2016 - [info] 192.168.6.149(192.168.6.149:3306) Mon Jan 18 17:28:42 2016 - [info] AliveSlaves: Mon Jan 18 17:28:42 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:42 2016 - [info] Replicating from 192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:42 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 17:28:42 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:42 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:42 2016 - [info] Checkingslave configurations.. Mon Jan 18 17:28:42 2016 - [info] read_only=1 is not set on slave192.168.6.91(192.168.6.91:3306). Mon Jan 18 17:28:42 2016 - [warning] relay_log_purge=0 is not set on slave192.168.6.91(192.168.6.91:3306). Mon Jan 18 17:28:42 2016 - [info] read_only=1 is not set on slave192.168.6.149(192.168.6.149:3306). Mon Jan 18 17:28:42 2016 - [warning] relay_log_purge=0 is not set on slave192.168.6.149(192.168.6.149:3306). Mon Jan 18 17:28:42 2016 - [info] Checkingreplication filtering settings.. Mon Jan 18 17:28:42 2016 - [info] Replication filtering check ok. Mon Jan 18 17:28:42 2016 - [info] Master isdown! Mon Jan 18 17:28:42 2016 - [info]Terminating monitoring script. Mon Jan 18 17:28:42 2016 - [info] Got exitcode 20 (Master dead). Mon Jan 18 17:28:42 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping. Mon Jan 18 17:28:42 2016 - [info] Readingapplication default configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 17:28:42 2016 - [info] Readingserver configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 17:28:42 2016 - [info]MHA::MasterFailover version 0.56. Mon Jan 18 17:28:42 2016 - [info] Startingmaster failover. Mon Jan 18 17:28:42 2016 - [info] Mon Jan 18 17:28:42 2016 - [info] * Phase1: Configuration Check Phase.. Mon Jan 18 17:28:42 2016 - [info] Mon Jan 18 17:28:42 2016 - [info] GTID failovermode = 0 Mon Jan 18 17:28:42 2016 - [info] DeadServers: Mon Jan 18 17:28:42 2016 - [info] 192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:42 2016 - [info] Checkingmaster reachability via MySQL(double check)... Mon Jan 18 17:28:43 2016 - [info] ok. Mon Jan 18 17:28:43 2016 - [info] AliveServers: Mon Jan 18 17:28:43 2016 - [info] 192.168.6.91(192.168.6.91:3306) Mon Jan 18 17:28:43 2016 - [info] 192.168.6.149(192.168.6.149:3306) Mon Jan 18 17:28:43 2016 - [info] AliveSlaves: Mon Jan 18 17:28:43 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 17:28:43 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] StartingNon-GTID based failover. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] ** Phase1: Configuration Check Phase completed. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase2: Dead Master Shutdown Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Forcingshutdown so that applications never connect to the current master.. Mon Jan 18 17:28:43 2016 - [warning] master_ip_failover_scriptis not set. Skipping invalidating dead master IP address. Mon Jan 18 17:28:43 2016 - [warning]shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon Jan 18 17:28:43 2016 - [info] * Phase2: Dead Master Shutdown Phase completed. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase3: Master Recovery Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase3.1: Getting Latest Slaves Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Thelatest binary log file/position on all slaves is binlog.000003:107 Mon Jan 18 17:28:43 2016 - [info] Latestslaves (Slaves that received relay log files to the latest): Mon Jan 18 17:28:43 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 17:28:43 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from 192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] Theoldest binary log file/position on all slaves is binlog.000003:107 Mon Jan 18 17:28:43 2016 - [info] Oldestslaves: Mon Jan 18 17:28:43 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 17:28:43 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase3.2: Saving Dead Master‘s Binlog Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [warning] DeadMaster is not SSH reachable. Could not save it‘s binlogs. Transactions thatwere not sent to the latest slave (Read_Master_Log_Pos to the tail of the deadmaster‘s binlog) were lost. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase3.3: Determining New Master Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Findingthe latest slave that has all relay logs for recovering other slaves.. Mon Jan 18 17:28:43 2016 - [info] Allslaves received relay logs to the same position. No need to resync each other. Mon Jan 18 17:28:43 2016 - [info] Searchingnew master from slaves.. Mon Jan 18 17:28:43 2016 - [info] Candidate masters from the configurationfile: Mon Jan 18 17:28:43 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 17:28:43 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 17:28:43 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 17:28:43 2016 - [info] Non-candidate masters: Mon Jan 18 17:28:43 2016 - [info] Searching from candidate_master slaves whichhave received the latest relay log events.. Mon Jan 18 17:28:43 2016 - [info] Newmaster is 192.168.6.91(192.168.6.91:3306) Mon Jan 18 17:28:43 2016 - [info] Startingmaster failover.. Mon Jan 18 17:28:43 2016 - [info] From: 192.168.6.85(192.168.6.85:3306) (currentmaster) +--192.168.6.91(192.168.6.91:3306) +--192.168.6.149(192.168.6.149:3306) To: 192.168.6.91(192.168.6.91:3306) (newmaster) +--192.168.6.149(192.168.6.149:3306) Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase3.3: New Master Diff Log Generation Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] This server has all relay logs. No need togenerate diff files from the latest slave. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase3.4: Master Log Apply Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] *NOTICE:If any error happens from this phase, manual recovery is needed. Mon Jan 18 17:28:43 2016 - [info] Startingrecovery on 192.168.6.91(192.168.6.91:3306).. Mon Jan 18 17:28:43 2016 - [info] This server has all relay logs. Waiting alllogs to be applied.. Mon Jan 18 17:28:43 2016 - [info] done. Mon Jan 18 17:28:43 2016 - [info] All relay logs were successfully applied. Mon Jan 18 17:28:43 2016 - [info] Gettingnew master‘s binlog name and position.. Mon Jan 18 17:28:43 2016 - [info] binlog.000003:107 Mon Jan 18 17:28:43 2016 - [info] All other slaves should start replicationfrom here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘192.168.6.91‘,MASTER_PORT=3306, MASTER_LOG_FILE=‘binlog.000003‘, MASTER_LOG_POS=107,MASTER_USER=‘repl‘, MASTER_PASSWORD=‘xxx‘; Mon Jan 18 17:28:43 2016 - [warning]master_ip_failover_script is not set. Skipping taking over new master IPaddress. Mon Jan 18 17:28:43 2016 - [info] **Finished master recovery successfully. Mon Jan 18 17:28:43 2016 - [info] * Phase3: Master Recovery Phase completed. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase4: Slaves Recovery Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] -- Slavediff file generation on host 192.168.6.149(192.168.6.149:3306) started, pid:31761. Check tmp log /var/tmp/192.168.6.149_3306_20160118172842.log if it takestime.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Logmessages from 192.168.6.149 ... Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] This server has all relay logs. No need togenerate diff files from the latest slave. Mon Jan 18 17:28:43 2016 - [info] End oflog messages from 192.168.6.149. Mon Jan 18 17:28:43 2016 - [info] --192.168.6.149(192.168.6.149:3306) has the latest relay log events. Mon Jan 18 17:28:43 2016 - [info]Generating relay diff files from the latest slave succeeded. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase4.2: Starting Parallel Slave Log Apply Phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] -- Slaverecovery on host 192.168.6.149(192.168.6.149:3306) started, pid: 31763. Checktmp log /var/tmp/192.168.6.149_3306_20160118172842.log if it takes time.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Logmessages from 192.168.6.149 ... Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Startingrecovery on 192.168.6.149(192.168.6.149:3306).. Mon Jan 18 17:28:43 2016 - [info] This server has all relay logs. Waiting alllogs to be applied.. Mon Jan 18 17:28:43 2016 - [info] done. Mon Jan 18 17:28:43 2016 - [info] All relay logs were successfully applied. Mon Jan 18 17:28:43 2016 - [info] Resetting slave192.168.6.149(192.168.6.149:3306) and starting replication from the new master192.168.6.91(192.168.6.91:3306).. Mon Jan 18 17:28:43 2016 - [info] Executed CHANGE MASTER. Mon Jan 18 17:28:43 2016 - [info] Slave started. Mon Jan 18 17:28:43 2016 - [info] End oflog messages from 192.168.6.149. Mon Jan 18 17:28:43 2016 - [info] -- Slaverecovery on host 192.168.6.149(192.168.6.149:3306) succeeded. Mon Jan 18 17:28:43 2016 - [info] All newslave servers recovered successfully. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] * Phase5: New master cleanup phase.. Mon Jan 18 17:28:43 2016 - [info] Mon Jan 18 17:28:43 2016 - [info] Resettingslave info on the new master.. Mon Jan 18 17:28:43 2016 - [info] 192.168.6.91: Resetting slave info succeeded. Mon Jan 18 17:28:43 2016 - [info] Masterfailover to 192.168.6.91(192.168.6.91:3306) completed successfully. Mon Jan 18 17:28:43 2016 - [info] Deletedserver1 entry from /etc/mha/masterha_default.cnf . Mon Jan 18 17:28:43 2016 - [info] ----- Failover Report ----- masterha_default: MySQL Master failover192.168.6.85(192.168.6.85:3306) to 192.168.6.91(192.168.6.91:3306) succeeded Master 192.168.6.85(192.168.6.85:3306) isdown! Check MHA Manager logs at slave2 fordetails. Started automated(non-interactive)failover. The latest slave192.168.6.91(192.168.6.91:3306) has all relay logs for recovery. Selected 192.168.6.91(192.168.6.91:3306) asa new master. 192.168.6.91(192.168.6.91:3306): OK:Applying all logs succeeded. 192.168.6.149(192.168.6.149:3306): Thishost has the latest relay log events. Generating relay diff files from the latestslave succeeded. 192.168.6.149(192.168.6.149:3306): OK:Applying all logs succeeded. Slave started, replicating from192.168.6.91(192.168.6.91:3306) 192.168.6.91(192.168.6.91:3306): Resettingslave info succeeded. Master failover to192.168.6.91(192.168.6.91:3306) completed successfully.
附录二:
Mon Jan 18 18:05:49 2016 - [warning] Goterror on MySQL select ping: 2006 (MySQL server has gone away) Mon Jan 18 18:05:49 2016 - [info] ExecutingSSH check script: save_binary_logs --command=test --start_pos=4--binlog_dir=/data/dbdata/mysqllog/binlog--output_file=/tmp/save_binary_logs_test --manager_version=0.56--binlog_prefix=binlog Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/dbdata/mysqllog/binlog, up to binlog.000003 Mon Jan 18 18:05:50 2016 - [info]HealthCheck: SSH to 192.168.6.85 is reachable. Mon Jan 18 18:05:50 2016 - [warning] Goterror on MySQL connect: 2013 (Lost connection to MySQL server at ‘readinginitial communication packet‘, system error: 111) Mon Jan 18 18:05:50 2016 - [warning]Connection failed 2 time(s).. Mon Jan 18 18:05:51 2016 - [warning] Goterror on MySQL connect: 2013 (Lost connection to MySQL server at ‘readinginitial communication packet‘, system error: 111) Mon Jan 18 18:05:51 2016 - [warning]Connection failed 3 time(s).. Mon Jan 18 18:05:52 2016 - [warning] Goterror on MySQL connect: 2013 (Lost connection to MySQL server at ‘readinginitial communication packet‘, system error: 111) Mon Jan 18 18:05:52 2016 - [warning]Connection failed 4 time(s).. Mon Jan 18 18:05:52 2016 - [warning] Masteris not reachable from health checker! Mon Jan 18 18:05:52 2016 - [warning] Master192.168.6.85(192.168.6.85:3306) is not reachable! Mon Jan 18 18:05:52 2016 - [warning] SSH isreachable. Mon Jan 18 18:05:52 2016 - [info]Connecting to a master server failed. Reading configuration file/etc/masterha_default.cnf and /etc/mha/masterha_default.cnf again, and tryingto connect to all servers to check server status.. Mon Jan 18 18:05:52 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping. Mon Jan 18 18:05:52 2016 - [info] Readingapplication default configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 18:05:52 2016 - [info] Readingserver configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 18:05:52 2016 - [info] GTIDfailover mode = 0 Mon Jan 18 18:05:52 2016 - [info] DeadServers: Mon Jan 18 18:05:52 2016 - [info] 192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:52 2016 - [info] AliveServers: Mon Jan 18 18:05:52 2016 - [info] 192.168.6.91(192.168.6.91:3306) Mon Jan 18 18:05:52 2016 - [info] 192.168.6.149(192.168.6.149:3306) Mon Jan 18 18:05:52 2016 - [info] AliveSlaves: Mon Jan 18 18:05:52 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:52 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:52 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 18:05:52 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:52 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:52 2016 - [info] Checkingslave configurations.. Mon Jan 18 18:05:52 2016 - [info] read_only=1 is not set on slave192.168.6.91(192.168.6.91:3306). Mon Jan 18 18:05:52 2016 - [warning] relay_log_purge=0 is not set on slave192.168.6.91(192.168.6.91:3306). Mon Jan 18 18:05:52 2016 - [info] read_only=1 is not set on slave192.168.6.149(192.168.6.149:3306). Mon Jan 18 18:05:52 2016 - [warning] relay_log_purge=0 is not set on slave192.168.6.149(192.168.6.149:3306). Mon Jan 18 18:05:52 2016 - [info] Checkingreplication filtering settings.. Mon Jan 18 18:05:52 2016 - [info] Replication filtering check ok. Mon Jan 18 18:05:52 2016 - [info] Master isdown! Mon Jan 18 18:05:52 2016 - [info]Terminating monitoring script. Mon Jan 18 18:05:52 2016 - [info] Got exitcode 20 (Master dead). Mon Jan 18 18:05:52 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping. Mon Jan 18 18:05:52 2016 - [info] Readingapplication default configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 18:05:52 2016 - [info] Readingserver configuration from /etc/mha/masterha_default.cnf.. Mon Jan 18 18:05:52 2016 - [info]MHA::MasterFailover version 0.56. Mon Jan 18 18:05:52 2016 - [info] Startingmaster failover. Mon Jan 18 18:05:52 2016 - [info] Mon Jan 18 18:05:52 2016 - [info] * Phase1: Configuration Check Phase.. Mon Jan 18 18:05:52 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] GTIDfailover mode = 0 Mon Jan 18 18:05:53 2016 - [info] DeadServers: Mon Jan 18 18:05:53 2016 - [info] 192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] Checkingmaster reachability via MySQL(double check)... Mon Jan 18 18:05:53 2016 - [info] ok. Mon Jan 18 18:05:53 2016 - [info] AliveServers: Mon Jan 18 18:05:53 2016 - [info] 192.168.6.91(192.168.6.91:3306) Mon Jan 18 18:05:53 2016 - [info] 192.168.6.149(192.168.6.149:3306) Mon Jan 18 18:05:53 2016 - [info] AliveSlaves: Mon Jan 18 18:05:53 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:53 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 18:05:53 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:53 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] StartingNon-GTID based failover. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] ** Phase1: Configuration Check Phase completed. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] * Phase2: Dead Master Shutdown Phase.. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] Forcingshutdown so that applications never connect to the current master.. Mon Jan 18 18:05:53 2016 - [warning]master_ip_failover_script is not set. Skipping invalidating dead master IPaddress. Mon Jan 18 18:05:53 2016 - [warning] shutdown_scriptis not set. Skipping explicit shutting down of the dead master. Mon Jan 18 18:05:53 2016 - [info] * Phase2: Dead Master Shutdown Phase completed. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] * Phase3: Master Recovery Phase.. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] * Phase3.1: Getting Latest Slaves Phase.. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] Thelatest binary log file/position on all slaves is binlog.000003:107 Mon Jan 18 18:05:53 2016 - [info] Latestslaves (Slaves that received relay log files to the latest): Mon Jan 18 18:05:53 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:53 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 18:05:53 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:53 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] Theoldest binary log file/position on all slaves is binlog.000003:107 Mon Jan 18 18:05:53 2016 - [info] Oldestslaves: Mon Jan 18 18:05:53 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:53 2016 - [info] Replicating from 192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 18:05:53 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:53 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] * Phase3.2: Saving Dead Master‘s Binlog Phase.. Mon Jan 18 18:05:53 2016 - [info] Mon Jan 18 18:05:53 2016 - [info] Fetchingdead master‘s binary logs.. Mon Jan 18 18:05:53 2016 - [info] Executingcommand on the dead master 192.168.6.85(192.168.6.85:3306): save_binary_logs--command=save --start_file=binlog.000003 --start_pos=107 --binlog_dir=/data/dbdata/mysqllog/binlog--output_file=/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlog--handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /tmp if not exists.. ok. Concat binary/relay logs from binlog.000003pos 107 to binlog.000003 EOF into/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlog .. Dumping binlog format description event, from position 0 to 107.. ok. Dumping effective binlog data from /data/dbdata/mysqllog/binlog/binlog.000003position 107 to tail(126).. ok. Concat succeeded. Mon Jan 18 18:05:54 2016 - [info] scp [email protected]:/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlogto local:/var/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlogsucceeded. Mon Jan 18 18:05:54 2016 - [info]HealthCheck: SSH to 192.168.6.91 is reachable. Mon Jan 18 18:05:55 2016 - [info]HealthCheck: SSH to 192.168.6.149 is reachable. Mon Jan 18 18:05:55 2016 - [info] Mon Jan 18 18:05:55 2016 - [info] * Phase3.3: Determining New Master Phase.. Mon Jan 18 18:05:55 2016 - [info] Mon Jan 18 18:05:55 2016 - [info] Findingthe latest slave that has all relay logs for recovering other slaves.. Mon Jan 18 18:05:55 2016 - [info] Allslaves received relay logs to the same position. No need to resync each other. Mon Jan 18 18:05:55 2016 - [info] Searchingnew master from slaves.. Mon Jan 18 18:05:55 2016 - [info] Candidate masters from the configurationfile: Mon Jan 18 18:05:55 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Mon Jan 18 18:05:55 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Mon Jan 18 18:05:55 2016 - [info] Primary candidate for the new Master(candidate_master is set) Mon Jan 18 18:05:55 2016 - [info] Non-candidate masters: Mon Jan 18 18:05:55 2016 - [info] Searching from candidate_master slaves whichhave received the latest relay log events.. Mon Jan 18 18:05:55 2016 - [info] New masteris 192.168.6.91(192.168.6.91:3306) Mon Jan 18 18:05:55 2016 - [info] Startingmaster failover.. Mon Jan 18 18:05:55 2016 - [info] From: 192.168.6.85(192.168.6.85:3306) (currentmaster) +--192.168.6.91(192.168.6.91:3306) +--192.168.6.149(192.168.6.149:3306) To: 192.168.6.91(192.168.6.91:3306) (newmaster) +--192.168.6.149(192.168.6.149:3306) Mon Jan 18 18:05:55 2016 - [info] Mon Jan 18 18:05:55 2016 - [info] * Phase3.3: New Master Diff Log Generation Phase.. Mon Jan 18 18:05:55 2016 - [info] Mon Jan 18 18:05:55 2016 - [info] This server has all relay logs. No need togenerate diff files from the latest slave. Mon Jan 18 18:05:55 2016 - [info] Sendingbinlog.. Mon Jan 18 18:05:55 2016 - [info] scp fromlocal:/var/tmp/saved_m[email protected]192.168.6.91:/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlogsucceeded. Mon Jan 18 18:05:55 2016 - [info] Mon Jan 18 18:05:55 2016 - [info] * Phase3.4: Master Log Apply Phase.. Mon Jan 18 18:05:55 2016 - [info] Mon Jan 18 18:05:55 2016 - [info] *NOTICE:If any error happens from this phase, manual recovery is needed. Mon Jan 18 18:05:55 2016 - [info] Startingrecovery on 192.168.6.91(192.168.6.91:3306).. Mon Jan 18 18:05:55 2016 - [info] Generating diffs succeeded. Mon Jan 18 18:05:55 2016 - [info] Waitinguntil all relay logs are applied. Mon Jan 18 18:05:55 2016 - [info] done. Mon Jan 18 18:05:55 2016 - [info] Gettingslave status.. Mon Jan 18 18:05:55 2016 - [info] Thisslave(192.168.6.91)‘s Exec_Master_Log_Pos equals toRead_Master_Log_Pos(binlog.000003:107). No need to recover fromExec_Master_Log_Pos. Mon Jan 18 18:05:55 2016 - [info]Connecting to the target slave host 192.168.6.91, running recover script.. Mon Jan 18 18:05:55 2016 - [info] Executingcommand: apply_diff_relay_logs --command=apply --slave_user=‘root‘--slave_host=192.168.6.91 --slave_ip=192.168.6.91 --slave_port=3306--apply_files=/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlog--workdir=/tmp --target_version=5.5.33-log --timestamp=20160118180552--handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56--slave_pass=xxx Mon Jan 18 18:05:56 2016 - [info] Applying differential binary/relay logfiles /tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlog on192.168.6.91:3306. This may take long time... Applying log files succeeded. Mon Jan 18 18:05:56 2016 - [info] All relay logs were successfully applied. Mon Jan 18 18:05:56 2016 - [info] Gettingnew master‘s binlog name and position.. Mon Jan 18 18:05:56 2016 - [info] binlog.000003:107 Mon Jan 18 18:05:56 2016 - [info] All other slaves should start replicationfrom here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘192.168.6.91‘,MASTER_PORT=3306, MASTER_LOG_FILE=‘binlog.000003‘, MASTER_LOG_POS=107,MASTER_USER=‘repl‘, MASTER_PASSWORD=‘xxx‘; Mon Jan 18 18:05:56 2016 - [warning]master_ip_failover_script is not set. Skipping taking over new master IPaddress. Mon Jan 18 18:05:56 2016 - [info] **Finished master recovery successfully. Mon Jan 18 18:05:56 2016 - [info] * Phase3: Master Recovery Phase completed. Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] * Phase4: Slaves Recovery Phase.. Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] * Phase4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] -- Slavediff file generation on host 192.168.6.149(192.168.6.149:3306) started, pid:32132. Check tmp log /var/tmp/192.168.6.149_3306_20160118180552.log if it takestime.. Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] Logmessages from 192.168.6.149 ... Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] This server has all relay logs. No need togenerate diff files from the latest slave. Mon Jan 18 18:05:56 2016 - [info] End oflog messages from 192.168.6.149. Mon Jan 18 18:05:56 2016 - [info] --192.168.6.149(192.168.6.149:3306) has the latest relay log events. Mon Jan 18 18:05:56 2016 - [info]Generating relay diff files from the latest slave succeeded. Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] * Phase4.2: Starting Parallel Slave Log Apply Phase.. Mon Jan 18 18:05:56 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] -- Slaverecovery on host 192.168.6.149(192.168.6.149:3306) started, pid: 32134. Checktmp log /var/tmp/192.168.6.149_3306_20160118180552.log if it takes time.. Mon Jan 18 18:05:57 2016 - [info] Mon Jan 18 18:05:57 2016 - [info] Log messagesfrom 192.168.6.149 ... Mon Jan 18 18:05:57 2016 - [info] Mon Jan 18 18:05:56 2016 - [info] Sendingbinlog.. Mon Jan 18 18:05:56 2016 - [info] scp fromlocal:/var/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlogto [email protected]:/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlogsucceeded. Mon Jan 18 18:05:56 2016 - [info] Startingrecovery on 192.168.6.149(192.168.6.149:3306).. Mon Jan 18 18:05:56 2016 - [info] Generating diffs succeeded. Mon Jan 18 18:05:56 2016 - [info] Waitinguntil all relay logs are applied. Mon Jan 18 18:05:56 2016 - [info] done. Mon Jan 18 18:05:56 2016 - [info] Gettingslave status.. Mon Jan 18 18:05:56 2016 - [info] Thisslave(192.168.6.149)‘s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(binlog.000003:107).No need to recover from Exec_Master_Log_Pos. Mon Jan 18 18:05:56 2016 - [info]Connecting to the target slave host 192.168.6.149, running recover script.. Mon Jan 18 18:05:56 2016 - [info] Executingcommand: apply_diff_relay_logs --command=apply --slave_user=‘root‘--slave_host=192.168.6.149 --slave_ip=192.168.6.149 --slave_port=3306--apply_files=/tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlog--workdir=/tmp --target_version=5.5.33-log --timestamp=20160118180552--handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56--slave_pass=xxx Mon Jan 18 18:05:57 2016 - [info] Applying differential binary/relay logfiles /tmp/saved_master_binlog_from_192.168.6.85_3306_20160118180552.binlog on192.168.6.149:3306. This may take long time... Applying log files succeeded. Mon Jan 18 18:05:57 2016 - [info] All relay logs were successfully applied. Mon Jan 18 18:05:57 2016 - [info] Resetting slave 192.168.6.149(192.168.6.149:3306)and starting replication from the new master 192.168.6.91(192.168.6.91:3306).. Mon Jan 18 18:05:57 2016 - [info] Executed CHANGE MASTER. Mon Jan 18 18:05:57 2016 - [info] Slave started. Mon Jan 18 18:05:57 2016 - [info] End oflog messages from 192.168.6.149. Mon Jan 18 18:05:57 2016 - [info] -- Slaverecovery on host 192.168.6.149(192.168.6.149:3306) succeeded. Mon Jan 18 18:05:57 2016 - [info] All newslave servers recovered successfully. Mon Jan 18 18:05:57 2016 - [info] Mon Jan 18 18:05:57 2016 - [info] * Phase5: New master cleanup phase.. Mon Jan 18 18:05:57 2016 - [info] Mon Jan 18 18:05:57 2016 - [info] Resettingslave info on the new master.. Mon Jan 18 18:05:57 2016 - [info] 192.168.6.91: Resetting slave info succeeded. Mon Jan 18 18:05:57 2016 - [info] Masterfailover to 192.168.6.91(192.168.6.91:3306) completed successfully. Mon Jan 18 18:05:57 2016 - [info] Deletedserver1 entry from /etc/mha/masterha_default.cnf . Mon Jan 18 18:05:57 2016 - [info] ----- Failover Report ----- masterha_default: MySQL Master failover192.168.6.85(192.168.6.85:3306) to 192.168.6.91(192.168.6.91:3306) succeeded Master 192.168.6.85(192.168.6.85:3306) isdown! Check MHA Manager logs at slave2 fordetails. Started automated(non-interactive)failover. The latest slave192.168.6.91(192.168.6.91:3306) has all relay logs for recovery. Selected 192.168.6.91(192.168.6.91:3306) asa new master. 192.168.6.91(192.168.6.91:3306): OK:Applying all logs succeeded. 192.168.6.149(192.168.6.149:3306): Thishost has the latest relay log events. Generating relay diff files from the latestslave succeeded. 192.168.6.149(192.168.6.149:3306): OK:Applying all logs succeeded. Slave started, replicating from192.168.6.91(192.168.6.91:3306) 192.168.6.91(192.168.6.91:3306): Resettingslave info succeeded. Master failover to192.168.6.91(192.168.6.91:3306) completed successfully.
附录三:
Fri Jan 22 10:36:17 2016 - [info]MHA::MasterRotate version 0.56. Fri Jan 22 10:36:17 2016 - [info] Startingonline master switch.. Fri Jan 22 10:36:17 2016 - [info] Fri Jan 22 10:36:17 2016 - [info] * Phase1: Configuration Check Phase.. Fri Jan 22 10:36:17 2016 - [info] Fri Jan 22 10:36:17 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping. Fri Jan 22 10:36:17 2016 - [info] Readingapplication default configuration from /etc/mha/masterha_default.cnf.. Fri Jan 22 10:36:17 2016 - [info] Readingserver configuration from /etc/mha/masterha_default.cnf.. Fri Jan 22 10:36:17 2016 - [info] GTIDfailover mode = 0 Fri Jan 22 10:36:17 2016 - [info] CurrentAlive Master: 192.168.6.85(192.168.6.85:3306) Fri Jan 22 10:36:17 2016 - [info] AliveSlaves: Fri Jan 22 10:36:17 2016 - [info] 192.168.6.91(192.168.6.91:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Fri Jan 22 10:36:17 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) Fri Jan 22 10:36:17 2016 - [info] Primary candidate for the new Master(candidate_master is set) Fri Jan 22 10:36:17 2016 - [info] 192.168.6.149(192.168.6.149:3306) Version=5.5.33-log (oldest major versionbetween slaves) log-bin:enabled Fri Jan 22 10:36:17 2016 - [info] Replicating from192.168.6.85(192.168.6.85:3306) It is better to execute FLUSHNO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to executeon 192.168.6.85(192.168.6.85:3306)? (YES/no): yes Fri Jan 22 10:36:19 2016 - [info] ExecutingFLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Fri Jan 22 10:36:19 2016 - [info] ok. Fri Jan 22 10:36:19 2016 - [info] CheckingMHA is not monitoring or doing failover.. Fri Jan 22 10:36:19 2016 - [info] Checkingreplication health on 192.168.6.91.. Fri Jan 22 10:36:19 2016 - [info] ok. Fri Jan 22 10:36:19 2016 - [info] Checkingreplication health on 192.168.6.149.. Fri Jan 22 10:36:19 2016 - [info] ok. Fri Jan 22 10:36:19 2016 - [info]192.168.6.91 can be new master. Fri Jan 22 10:36:19 2016 - [info] From: 192.168.6.85(192.168.6.85:3306) (currentmaster) +--192.168.6.91(192.168.6.91:3306) +--192.168.6.149(192.168.6.149:3306) To: 192.168.6.91(192.168.6.91:3306) (newmaster) +--192.168.6.149(192.168.6.149:3306) +--192.168.6.85(192.168.6.85:3306) Starting master switch from192.168.6.85(192.168.6.85:3306) to 192.168.6.91(192.168.6.91:3306)? (yes/NO):yes Fri Jan 22 10:36:22 2016 - [info] Checkingwhether 192.168.6.91(192.168.6.91:3306) is ok for the new master.. Fri Jan 22 10:36:22 2016 - [info] ok. Fri Jan 22 10:36:22 2016 - [info]192.168.6.85(192.168.6.85:3306): SHOW SLAVE STATUS returned empty result. Tocheck replication filtering rules, temporarily executing CHANGE MASTER to adummy host. Fri Jan 22 10:36:22 2016 - [info]192.168.6.85(192.168.6.85:3306): Resetting slave pointing to the dummy host. Fri Jan 22 10:36:22 2016 - [info] ** Phase1: Configuration Check Phase completed. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] * Phase2: Rejecting updates Phase.. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] Executingmaster ip online change script to disable write on the current master: Fri Jan 22 10:36:22 2016 - [info] /usr/bin/master_ip_online_change--command=stop --orig_master_host=192.168.6.85 --orig_master_ip=192.168.6.85--orig_master_port=3306 --orig_master_user=‘root‘ --orig_master_password=‘123456‘--new_master_host=192.168.6.91 --new_master_ip=192.168.6.91--new_master_port=3306 --new_master_user=‘root‘ --new_master_password=‘123456‘--orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave Fri Jan 22 10:36:22 2016 196292 Setread_only on the new master.. ok. Fri Jan 22 10:36:22 2016 200819 Drpping appuser on the orig master.. Fri Jan 22 10:36:22 2016 201729 Setread_only=1 on the orig master.. ok. Fri Jan 22 10:36:22 2016 203608 Killing allapplication threads.. Fri Jan 22 10:36:22 2016 203628 done. Fri Jan 22 10:36:22 2016 - [info] ok. Fri Jan 22 10:36:22 2016 - [info] Lockingall tables on the orig master to reject updates from everybody (includingroot): Fri Jan 22 10:36:22 2016 - [info] ExecutingFLUSH TABLES WITH READ LOCK.. Fri Jan 22 10:36:22 2016 - [info] ok. Fri Jan 22 10:36:22 2016 - [info] Origmaster binlog:pos is binlog.000005:107. Fri Jan 22 10:36:22 2016 - [info] Waiting to execute all relay logs on192.168.6.91(192.168.6.91:3306).. Fri Jan 22 10:36:22 2016 - [info] master_pos_wait(binlog.000005:107) completedon 192.168.6.91(192.168.6.91:3306). Executed 0 events. Fri Jan 22 10:36:22 2016 - [info] done. Fri Jan 22 10:36:22 2016 - [info] Gettingnew master‘s binlog name and position.. Fri Jan 22 10:36:22 2016 - [info] binlog.000003:399 Fri Jan 22 10:36:22 2016 - [info] All other slaves should start replicationfrom here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘192.168.6.91‘,MASTER_PORT=3306, MASTER_LOG_FILE=‘binlog.000003‘, MASTER_LOG_POS=399,MASTER_USER=‘repl‘, MASTER_PASSWORD=‘xxx‘; Fri Jan 22 10:36:22 2016 - [info] Executingmaster ip online change script to allow write on the new master: Fri Jan 22 10:36:22 2016 - [info] /usr/bin/master_ip_online_change--command=start --orig_master_host=192.168.6.85 --orig_master_ip=192.168.6.85--orig_master_port=3306 --orig_master_user=‘root‘--orig_master_password=‘123456‘ --new_master_host=192.168.6.91--new_master_ip=192.168.6.91 --new_master_port=3306 --new_master_user=‘root‘ --new_master_password=‘123456‘--orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave Fri Jan 22 10:36:22 2016 302733 Setread_only=0 on the new master. Fri Jan 22 10:36:22 2016 303665 Creatingapp user on the new master.. Fri Jan 22 10:36:22 2016 - [info] ok. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] *Switching slaves in parallel.. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] -- Slaveswitch on host 192.168.6.149(192.168.6.149:3306) started, pid: 15192 Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] Logmessages from 192.168.6.149 ... Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] Waiting to execute all relay logs on192.168.6.149(192.168.6.149:3306).. Fri Jan 22 10:36:22 2016 - [info] master_pos_wait(binlog.000005:107) completedon 192.168.6.149(192.168.6.149:3306). Executed 0 events. Fri Jan 22 10:36:22 2016 - [info] done. Fri Jan 22 10:36:22 2016 - [info] Resetting slave 192.168.6.149(192.168.6.149:3306)and starting replication from the new master 192.168.6.91(192.168.6.91:3306).. Fri Jan 22 10:36:22 2016 - [info] Executed CHANGE MASTER. Fri Jan 22 10:36:22 2016 - [info] Slave started. Fri Jan 22 10:36:22 2016 - [info] End oflog messages from 192.168.6.149 ... Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] -- Slaveswitch on host 192.168.6.149(192.168.6.149:3306) succeeded. Fri Jan 22 10:36:22 2016 - [info] Unlockingall tables on the orig master: Fri Jan 22 10:36:22 2016 - [info] ExecutingUNLOCK TABLES.. Fri Jan 22 10:36:22 2016 - [info] ok. Fri Jan 22 10:36:22 2016 - [info] Startingorig master as a new slave.. Fri Jan 22 10:36:22 2016 - [info] Resetting slave192.168.6.85(192.168.6.85:3306) and starting replication from the new master192.168.6.91(192.168.6.91:3306).. Fri Jan 22 10:36:22 2016 - [info] Executed CHANGE MASTER. Fri Jan 22 10:36:22 2016 - [info] Slave started. Fri Jan 22 10:36:22 2016 - [info] All newslave servers switched successfully. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] * Phase5: New master cleanup phase.. Fri Jan 22 10:36:22 2016 - [info] Fri Jan 22 10:36:22 2016 - [info] 192.168.6.91: Resetting slave info succeeded. Fri Jan 22 10:36:22 2016 - [info] Switchingmaster to 192.168.6.91(192.168.6.91:3306) completed successfully.