MySql 的 MHA 配置 / 憋错料

MySql Faliover 可以使用MHA来配置，其原理是通过半同步日志，自动选举slave为新的master，

如果配合VIP使用，可以做到应用层平滑过渡（一般在30秒内切换完成），由于使用了半同步日志，

可以避免脑裂（MMM方案的问题）和最大程度的恢复master状态，保证一致性。

安装 MHA

mha-manager : 10.1.1.107

node : 10.1.1.102，10.1.1.107，10.1.1.108

在三台node节点上配置别名

vim /etc/hosts
10.1.1.102 dbsrv1
10.1.1.107 dbsrv2
10.1.1.108 dbsrv3

复制到三台机器

配置主机名

vim /etc/sysconfig/network

HOSTNAME=dbsrv2

修改后还要执行 $> hostname dbsrv2

使当前环境生效

其它node也要各自配置自己的hostname

10.1.1.107:

wget  mha4mysql-node-0.56-0.el5.noarch.rpm
###以下安装包，建议node和manger都安装
# yum install perl-DBD-MySQL
# yum install perl-Config-Tiny
# yum install perl-Log-Dispatch
# yum install perl-Parallel-ForkManager
[[email protected] downloads]# rpm -ivh mha4mysql-node-0.56-0.el5.noarch.rpm
Preparing...                ########################################### [100%]
   1:mha4mysql-node         ########################################### [100%]
[[email protected] downloads]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
Preparing...                ########################################### [100%]
   1:mha4mysql-manager      ########################################### [100%]

配置SSH免登陆：

[[email protected] ~]# ssh-keygen
[[email protected] ~]# ls -a ./.ssh/
.  ..  id_rsa  id_rsa.pub
[[email protected] ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

使用上面的方法在其余的几个节点如法炮制后验证等效性互相验证，在

107 : ssh dbsrv1, ssh dbsrv2

102 : ssh dbsrv2, ssh dbsrv3

108 : ssh dbsrv1, ssh dbsrv2

都可以免密码登陆

注意如果manager安装在dbserv2的话，还需要执行

ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

否则会发现自己 ssh 自己不通

copy ha-manager源码中的 master_default.cnf 到 etc下

$> cp /root/downloads/mha4mysql-manager-0.56/samples/conf/masterha_default.cnf /etc/

修改如下：

[server default]
user=repluser
password=replpass
ssh_user=root
repl_user=repluser
repl_password=replpass
ping_interval=3

master_ip_failover_script="/etc/mha/master_ip_failover"

copy 源码中 app1.cnf 到 /etc/mha/下

$> mkdir -p /etc/mha

$> cp /root/downloads/mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/mha/

并编辑如下：

[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
secondary_check_script="masterha_secondary_check -s dbsrv1 -s dbsrv3"

[server1]
hostname=dbsrv1
candidate_master=1
master_binlog_dir="/var/lib/mysql"
[server2]
hostname=dbsrv2
no_master=1
master_binlog_dir="/var/lib/mysql"
[server3]
hostname=dbsrv3
candidate_master=1
master_binlog_dir="/var/lib/mysql"

进行验证：

masterha_check_ssh --conf=/etc/app1.cnf

发现如下错误：

Can‘t locate MHA/NodeConst.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/MHA/ManagerConst.pm line 25.

下载 tar.gz 源码进行安装 node 和 manager 才不会出现这个错误

wget https://googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-node-0.56.tar.gz

tar xf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node
perl Makefile.PL
make && make install

在node3上安装mha4mysql-manager
wget https://googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-manager-0.56.tar.gz
tar xf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes

再次执行顺利通过

[[email protected] ~]# masterha_check_ssh --conf=/etc/app1.cnf
Fri Jul 10 17:45:45 2015 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Jul 10 17:45:45 2015 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Jul 10 17:45:45 2015 - [info] Reading server configuration from /etc/app1.cnf..
Fri Jul 10 17:45:45 2015 - [info] Starting SSH connection tests..
Fri Jul 10 17:45:47 2015 - [debug]
Fri Jul 10 17:45:45 2015 - [debug]  Connecting via SSH from [email protected](10.1.1.102:22) to [email protected](10.1.1.107:22)..
Fri Jul 10 17:45:46 2015 - [debug]   ok.
Fri Jul 10 17:45:46 2015 - [debug]  Connecting via SSH from [email protected](10.1.1.102:22) to [email protected](10.1.1.108:22)..
Fri Jul 10 17:45:46 2015 - [debug]   ok.
Fri Jul 10 17:45:47 2015 - [debug]
Fri Jul 10 17:45:46 2015 - [debug]  Connecting via SSH from [email protected](10.1.1.107:22) to [email protected](10.1.1.102:22)..
Fri Jul 10 17:45:46 2015 - [debug]   ok.
Fri Jul 10 17:45:46 2015 - [debug]  Connecting via SSH from [email protected](10.1.1.107:22) to [email protected](10.1.1.108:22)..
Fri Jul 10 17:45:47 2015 - [debug]   ok.
Fri Jul 10 17:45:48 2015 - [debug]
Fri Jul 10 17:45:46 2015 - [debug]  Connecting via SSH from [email protected](10.1.1.108:22) to [email protected](10.1.1.102:22)..
Fri Jul 10 17:45:47 2015 - [debug]   ok.
Fri Jul 10 17:45:47 2015 - [debug]  Connecting via SSH from [email protected](10.1.1.108:22) to [email protected](10.1.1.107:22)..
Fri Jul 10 17:45:48 2015 - [debug]   ok.
Fri Jul 10 17:45:48 2015 - [info] All SSH connection tests passed successfully.

测试master slave Replication

[[email protected] ~]# masterha_check_repl --conf=/etc/app1.cnf
Fri Jul 10 17:52:19 2015 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Jul 10 17:52:19 2015 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Jul 10 17:52:19 2015 - [info] Reading server configuration from /etc/app1.cnf..
Fri Jul 10 17:52:19 2015 - [info] MHA::MasterMonitor version 0.56.
Creating directory /var/log/masterha/app1.. done.
Fri Jul 10 17:52:19 2015 - [info] GTID failover mode = 1
Fri Jul 10 17:52:19 2015 - [info] Dead Servers:
Fri Jul 10 17:52:19 2015 - [info] Alive Servers:
Fri Jul 10 17:52:19 2015 - [info]   dbsrv1(10.1.1.102:3306)
Fri Jul 10 17:52:19 2015 - [info]   dbsrv2(10.1.1.107:3306)
Fri Jul 10 17:52:19 2015 - [info]   dbsrv3(10.1.1.108:3306)
Fri Jul 10 17:52:19 2015 - [info] Alive Slaves:
Fri Jul 10 17:52:19 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Fri Jul 10 17:52:19 2015 - [info]     GTID ON
Fri Jul 10 17:52:19 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Fri Jul 10 17:52:19 2015 - [info]     Not candidate for the new Master (no_master is set)
Fri Jul 10 17:52:19 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Fri Jul 10 17:52:19 2015 - [info]     GTID ON
Fri Jul 10 17:52:19 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Fri Jul 10 17:52:19 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jul 10 17:52:19 2015 - [info] Current Alive Master: dbsrv1(10.1.1.102:3306)
Fri Jul 10 17:52:19 2015 - [info] Checking slave configurations..
Fri Jul 10 17:52:19 2015 - [info]  read_only=1 is not set on slave dbsrv2(10.1.1.107:3306).
Fri Jul 10 17:52:19 2015 - [info]  read_only=1 is not set on slave dbsrv3(10.1.1.108:3306).
Fri Jul 10 17:52:19 2015 - [info] Checking replication filtering settings..
Fri Jul 10 17:52:19 2015 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Jul 10 17:52:19 2015 - [info]  Replication filtering check ok.
Fri Jul 10 17:52:19 2015 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Jul 10 17:52:19 2015 - [info] Checking SSH publickey authentication settings on the current master..
Fri Jul 10 17:52:19 2015 - [info] HealthCheck: SSH to dbsrv1 is reachable.
Fri Jul 10 17:52:19 2015 - [info]
dbsrv1(10.1.1.102:3306) (current master)
 +--dbsrv2(10.1.1.107:3306)
 +--dbsrv3(10.1.1.108:3306)
Fri Jul 10 17:52:19 2015 - [info] Checking replication health on dbsrv2..
Fri Jul 10 17:52:19 2015 - [info]  ok.
Fri Jul 10 17:52:19 2015 - [info] Checking replication health on dbsrv3..
Fri Jul 10 17:52:19 2015 - [info]  ok.
Fri Jul 10 17:52:19 2015 - [warning] master_ip_failover_script is not defined.
Fri Jul 10 17:52:19 2015 - [warning] shutdown_script is not defined.
Fri Jul 10 17:52:19 2015 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

启动MHA manager，并监控日志文件

[[email protected] etc]# masterha_manager --conf=/etc/app1.cnf

日志

Mon Jul 13 11:10:20 2015 - [info] Reading default configuration from /etc/masterha_default.cnf..
Mon Jul 13 11:10:20 2015 - [info] Reading application default configuration from /etc/app1.cnf..
Mon Jul 13 11:10:20 2015 - [info] Reading server configuration from /etc/app1.cnf..
查看 /var/log/masterha/app1/manager.log
dbsrv1(10.1.1.102:3306) (current master)
 +--dbsrv2(10.1.1.107:3306)
 +--dbsrv3(10.1.1.108:3306)
Mon Jul 13 11:10:20 2015 - [warning] master_ip_failover_script is not defined.
Mon Jul 13 11:10:20 2015 - [warning] shutdown_script is not defined.
Mon Jul 13 11:10:20 2015 - [info] Set master ping interval 3 seconds.
Mon Jul 13 11:10:20 2015 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Jul 13 11:10:20 2015 - [info] Starting ping health check on dbsrv1(10.1.1.102:3306)..
Mon Jul 13 11:10:20 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn‘t respond..

可以看到三个脚本没有设置，那么即便现在做什么failover测试也是没有意义的。

app1.cnf解释

candidate_master=1 表示该主机优先可被选为new master，当多个[serverX]等设置此参数时，优先级由[serverX]配置的顺序决定

secondary_check_script mha强烈建议有两个或多个网络线路检查MySQL主服务器的可用性。默认情况下,只有单一的路线 MHA Manager检查:从Manager to Master,但这是不可取的。MHA实际上可以有两个或两个以上的检查路线通过调用外部脚本定义二次检查脚本参数

master_ip_failover_script 在MySQL从服务器提升为新的主服务器时，调用此脚本，因此可以将vip信息写到此配置文件

master_ip_online_change_script 使用masterha_master_switch命令手动切换MySQL主服务器时后会调用此脚本，参数和master_ip_failover_script 类似，脚本可以互用 shutdown_script 此脚本(默认samples内的脚本)利用服务器的远程控制IDRAC等，使用ipmitool强制去关机，以避免fence设备重启主服务器，造成脑列现象

report_script 当新主服务器切换完成以后通过此脚本发送邮件报告，可参考使用 http://caspian.dotconf.net/menu/Software/SendEmail/sendEmail-v1.56.tar.gz

以上涉及到的脚本可以从mha4mysql-manager-0.56/samples/scripts/*拷贝进行修改使用

failover测试

手动停掉mysql master服务 10.1.1.102 > /etc/init.d/mysqld stop

观察107 上的/var/log/masterha/app1/manager.log:

dbsrv1(10.1.1.102:3306) (current master)
 +--dbsrv2(10.1.1.107:3306)
 +--dbsrv3(10.1.1.108:3306)

Mon Jul 13 11:48:51 2015 - [warning] master_ip_failover_script is not defined.
Mon Jul 13 11:48:51 2015 - [warning] shutdown_script is not defined.
Mon Jul 13 11:48:51 2015 - [info] Set master ping interval 3 seconds.
Mon Jul 13 11:48:51 2015 - [info] Set secondary check script: masterha_secondary_check -s dbsrv1 -s dbsrv3
Mon Jul 13 11:48:51 2015 - [info] Starting ping health check on dbsrv1(10.1.1.102:3306)..
Mon Jul 13 11:48:51 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn‘t respond..
Mon Jul 13 12:14:16 2015 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon Jul 13 12:14:16 2015 - [info] Executing secondary network check script: masterha_secondary_check -s dbsrv1 -s dbsrv3  --user=root  --master_host=dbsrv1  --master_ip=10.1.1.102  --master_port=3306 --master_user=repluser --master_password=replpass --ping_type=SELECT
Mon Jul 13 12:14:16 2015 - [info] Executing SSH check script: exit 0
Mon Jul 13 12:14:16 2015 - [info] HealthCheck: SSH to dbsrv1 is reachable.
Monitoring server dbsrv1 is reachable, Master is not reachable from dbsrv1. OK.
Monitoring server dbsrv3 is reachable, Master is not reachable from dbsrv3. OK.
Mon Jul 13 12:14:16 2015 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Jul 13 12:14:19 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at ‘reading initial communication packet‘, system error: 111)
Mon Jul 13 12:14:19 2015 - [warning] Connection failed 2 time(s)..
Mon Jul 13 12:14:22 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at ‘reading initial communication packet‘, system error: 111)
Mon Jul 13 12:14:22 2015 - [warning] Connection failed 3 time(s)..
Mon Jul 13 12:14:25 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at ‘reading initial communication packet‘, system error: 111)
Mon Jul 13 12:14:25 2015 - [warning] Connection failed 4 time(s)..
Mon Jul 13 12:14:25 2015 - [warning] Master is not reachable from health checker!
Mon Jul 13 12:14:25 2015 - [warning] Master dbsrv1(10.1.1.102:3306) is not reachable!
Mon Jul 13 12:14:25 2015 - [warning] SSH is reachable.
Mon Jul 13 12:14:25 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Mon Jul 13 12:14:25 2015 - [info] Reading default configuration from /etc/masterha_default.cnf..
Mon Jul 13 12:14:25 2015 - [info] Reading application default configuration from /etc/app1.cnf..
Mon Jul 13 12:14:25 2015 - [info] Reading server configuration from /etc/app1.cnf..
Mon Jul 13 12:14:25 2015 - [info] GTID failover mode = 1
Mon Jul 13 12:14:25 2015 - [info] Dead Servers:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv1(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info] Alive Servers:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)
Mon Jul 13 12:14:25 2015 - [info] Alive Slaves:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 13 12:14:25 2015 - [info] Checking slave configurations..
Mon Jul 13 12:14:25 2015 - [info]  read_only=1 is not set on slave dbsrv2(10.1.1.107:3306).
Mon Jul 13 12:14:25 2015 - [info]  read_only=1 is not set on slave dbsrv3(10.1.1.108:3306).
Mon Jul 13 12:14:25 2015 - [info] Checking replication filtering settings..
Mon Jul 13 12:14:25 2015 - [info]  Replication filtering check ok.
Mon Jul 13 12:14:25 2015 - [info] Master is down!
Mon Jul 13 12:14:25 2015 - [info] Terminating monitoring script.
Mon Jul 13 12:14:25 2015 - [info] Got exit code 20 (Master dead).
Mon Jul 13 12:14:25 2015 - [info] MHA::MasterFailover version 0.56.
Mon Jul 13 12:14:25 2015 - [info] Starting master failover.
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] GTID failover mode = 1
Mon Jul 13 12:14:25 2015 - [info] Dead Servers:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv1(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info] Checking master reachability via MySQL(double check)...
Mon Jul 13 12:14:25 2015 - [info]  ok.
Mon Jul 13 12:14:25 2015 - [info] Alive Servers:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)
Mon Jul 13 12:14:25 2015 - [info] Alive Slaves:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 13 12:14:25 2015 - [info] Starting GTID based failover.
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Jul 13 12:14:25 2015 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Mon Jul 13 12:14:25 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Jul 13 12:14:25 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 3: Master Recovery Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] The latest binary log file/position on all slaves is mysqlmaster-bin.000004:191
Mon Jul 13 12:14:25 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 13 12:14:25 2015 - [info] The oldest binary log file/position on all slaves is mysqlmaster-bin.000004:191
Mon Jul 13 12:14:25 2015 - [info] Oldest slaves:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 3.3: Determining New Master Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] Searching new master from slaves..
Mon Jul 13 12:14:25 2015 - [info]  Candidate masters from the configuration file:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jul 13 12:14:25 2015 - [info]  Non-candidate masters:
Mon Jul 13 12:14:25 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Mon Jul 13 12:14:25 2015 - [info]     GTID ON
Mon Jul 13 12:14:25 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Mon Jul 13 12:14:25 2015 - [info]     Not candidate for the new Master (no_master is set)
Mon Jul 13 12:14:25 2015 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Mon Jul 13 12:14:25 2015 - [info] New master is dbsrv3(10.1.1.108:3306)
Mon Jul 13 12:14:25 2015 - [info] Starting master failover..
Mon Jul 13 12:14:25 2015 - [info]
From:
dbsrv1(10.1.1.102:3306) (current master)
 +--dbsrv2(10.1.1.107:3306)
 +--dbsrv3(10.1.1.108:3306)

To:
dbsrv3(10.1.1.108:3306) (new master)
 +--dbsrv2(10.1.1.107:3306)
 Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info]  Waiting all logs to be applied..
Mon Jul 13 12:14:25 2015 - [info]   done.
Mon Jul 13 12:14:25 2015 - [info]  Replicating from the latest slave dbsrv2(10.1.1.107:3306) and waiting to apply..
Mon Jul 13 12:14:25 2015 - [info]  Waiting all logs to be applied on the latest slave..
Mon Jul 13 12:14:25 2015 - [info]  Resetting slave dbsrv3(10.1.1.108:3306) and starting replication from the new master dbsrv2(10.1.1.107:3306)..
Mon Jul 13 12:14:25 2015 - [info]  Executed CHANGE MASTER.
Mon Jul 13 12:14:25 2015 - [info]  Slave started.
Mon Jul 13 12:14:25 2015 - [info]  Waiting to execute all relay logs on dbsrv3(10.1.1.108:3306)..
Mon Jul 13 12:14:25 2015 - [info]  master_pos_wait(mysqlslave-bin.000007:231) completed on dbsrv3(10.1.1.108:3306). Executed 2 events.
Mon Jul 13 12:14:25 2015 - [info]   done.
Mon Jul 13 12:14:25 2015 - [info]   done.
Mon Jul 13 12:14:25 2015 - [info] Getting new master‘s binlog name and position..
Mon Jul 13 12:14:25 2015 - [info]  mysqlslave-bin.000003:536037738
Mon Jul 13 12:14:25 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘dbsrv3 or 10.1.1.108‘, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=‘repluser‘, MASTER_PASSWORD=‘xxx‘;
Mon Jul 13 12:14:25 2015 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysqlslave-bin.000003, 536037738, 8c71815b-116f-11e4-b9e2-0050569f2c2d:1-928447,
8c71815b-116f-11e4-b9e2-0050569f2c2e:1-11,
8c71815b-116f-11e4-b9e2-0050569f2c2f:1-6
Mon Jul 13 12:14:25 2015 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Mon Jul 13 12:14:25 2015 - [info] ** Finished master recovery successfully.
Mon Jul 13 12:14:25 2015 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Jul 13 12:14:25 2015 - [info]
Mon Jul 13 12:14:25 2015 - [info] -- Slave recovery on host dbsrv2(10.1.1.107:3306) started, pid: 18829. Check tmp log /var/log/masterha/app1/dbsrv2_3306_20150713121425.log if it takes time..

登陆107，查看mysql slave status，发现master已经切换为108（dbsrv3）

登陆108，查看mysql master status，发现108已经成为新的master，且position 与107一致。

说明failover切换成功。

但是对于应用层来讲，ip从102切换到了108，这个改动需要vip来屏蔽。

个人感觉VIP应该有两处，一处是Atlas上层的应用层对于Atlas主备切换的隐藏。

一处使Atlas下方Master failover时IP切换的隐藏。

VIP配置：

首先说明linux里配置vip很简单

测试102上配置vip 110:

[[email protected] ~]# ifconfig eth1:0 10.1.1.110 netmask 255.255.255.0 up
[[email protected] ~]# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:50:56:9F:71:BF 
          inet addr:10.1.1.102  Bcast:10.1.1.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fe9f:71bf/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8349930 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6570972 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1395377631 (1.2 GiB)  TX bytes:4202193741 (3.9 GiB)

eth1:0    Link encap:Ethernet  HWaddr 00:50:56:9F:71:BF 
          inet addr:10.1.1.110  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1189110 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1189110 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:59591903 (56.8 MiB)  TX bytes:59591903 (56.8 MiB)

然后在107上可以ssh过来到102.

同样，停止vip：

$> ifconfig eth1:0 down

测试在108配置同样的ip后，从107上ssh到110，发现已经来到了108了。

这就是failover的原理。

不过切换要程序来做，而不是手动切换。

有两种方式切换VIP，方法一是使用MHA 的回调接口 master_ip_failover_script 需要使用perl 让新的master VIP生效

旧的master VIP失效（通过ssh到两台机器上分别执行 ifconfig命令实现）

方法二是使用 keepalived+ heatbeart 实时监控两台master的状态（通过mysql侦测脚本），并自动切换VIP到可用的master上。

两种方法，keepalived方法可能会与MHA不同步，而造成问题。所以决定采用MHA 的perl回调脚本来实现：

master_ip_failover_script:

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => ‘all‘;

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password, 
);

GetOptions(
  ‘command=s‘             => \$command,
  ‘ssh_user=s‘            => \$ssh_user,
  ‘orig_master_host=s‘    => \$orig_master_host,
  ‘orig_master_ip=s‘      => \$orig_master_ip,
  ‘orig_master_port=i‘    => \$orig_master_port,
  ‘new_master_host=s‘     => \$new_master_host,
  ‘new_master_ip=s‘       => \$new_master_ip,
  ‘new_master_port=i‘     => \$new_master_port,
  ‘new_master_user=s‘     => \$new_master_user,
  ‘new_master_password=s‘ => \$new_master_password,
);

my $vip = "10.1.1.110"; 

my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip  = "/etc/init.d/keepalived stop";

exit &main();

sub main {
  print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
	  print "Disabling the VIP on old master: $orig_master_host \n";
      &stop_vip();
      # updating global catalog, etc
      $exit_code = 0;
    };
    if ([email protected]) {
      warn "Got Error: [email protected]\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      print "Enabling the VIP - $vip on the new master - $new_master_host \n";
      &start_vip();
      $exit_code = 0;
    };
    if ([email protected]) {
      warn [email protected];

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {
	print "Checking the Status of the script.. OK \n";
    #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
  `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}

# A simple system call that disable the VIP on the old_master
sub stop_vip() {
  `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

10.1.1.102 (dbsrv1) /etc/keepalived/keepalived.conf

vrrp_instance VI_1 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.110
    }
}

10.1.1.108 (dbsrv3) /etc/keepalived/keepalived.conf

vrrp_instance VI_1 {
    state MASTER
    interface eth2
    virtual_router_id 51
    priority 99
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.1.1.110
    }
}

FailOver测试，

======准备======

ssh [email protected](dbsrv1) :

$> /etc/init.d/keepalived start

ssh [email protected](dbsrv2):

$> masterha_manager —conf=/etc/mha/app1.cnf
$> tail -f /var/log/masterha/app1/manager.log
$> mysql -u root
$mysql> stop slave;
$mysql> change master to master_host=‘10.1.1.102‘,master_user=‘repluser‘,master_password=‘replpass‘,master_auto_position=1;
$mysql> start slave;

ssh [email protected](dbsrv3):

$> /etc/init.d/keepalived stop
$> mysql -u root
$mysql> stop slave;
$mysql> change master to master_host=‘10.1.1.102‘,master_user=‘repluser‘,master_password=‘replpass‘,master_auto_position=1;
$mysql> start slave;

jdbc test client:

package com.shenli.java;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Scanner;
public class MySqlFailOverTester {

private static String url = "jdbc:mysql://10.1.1.110:3306/repltest";
private static String user = "repluser";
private static String password = "replpass";
private static volatile boolean stop = false;
private static final SimpleDateFormat format = new SimpleDateFormat(
"yyyy-MM-dd HH:mm:ss");

private static Connection getConnection() {
    System.out.println("getConnection().");
    Connection conn = null;

    try {
        Class.forName("com.mysql.jdbc.Driver");
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    }
    try {
        conn = DriverManager.getConnection(url, user, password);
    } catch (SQLException e) {
        e.printStackTrace();
    }
    System.out.println("conn : " + conn);
    return conn;
}

/**
  * @param args
  */
public static void main(String[] args) {

    new Thread() {
        public void run() {
            Scanner scan = new Scanner(System.in);
            do {
                System.out.println("please input ‘end‘ to stop.");
            } while (!scan.next().equals("end"));
            stop = true;
        };
    }.start();

    Connection conn = getConnection();
    while (!stop) {
        System.out.println("still run.");
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        Statement st;
        try {
            st = conn.createStatement();
            ResultSet rs = st.executeQuery("select count(*) from user;");
            if (rs.next())
                System.out.println(format.format(new Date())
+ " user.count:" + rs.getInt(1));
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println("conn is : " + conn);
            // if(conn == null){
            conn = getConnection();
            // }
        }

    }

}

======模拟10.1.1.102 Fail=====

ssh [email protected]

$> /etc/init.d/mysqld stop

查看 /var/log/masterha/app1/manager.log:

Tue Jul 14 16:43:59 2015 - [info] Set secondary check script: masterha_secondary_check -s dbsrv1 -s dbsrv3
Tue Jul 14 16:43:59 2015 - [info] Starting ping health check on dbsrv1(10.1.1.102:3306)..
Tue Jul 14 16:43:59 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn‘t respond..
Tue Jul 14 16:44:29 2015 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jul 14 16:44:29 2015 - [info] Executing secondary network check script: masterha_secondary_check -s dbsrv1 -s dbsrv3  --user=root  --master_host=dbsrv1  --master_ip=10.1.1.102  --master_port=3306 --master_user=repluser --master_password=replpass --ping_type=SELECT
Tue Jul 14 16:44:29 2015 - [info] Executing SSH check script: exit 0
Tue Jul 14 16:44:29 2015 - [info] HealthCheck: SSH to dbsrv1 is reachable.
Monitoring server dbsrv1 is reachable, Master is not reachable from dbsrv1. OK.
Monitoring server dbsrv3 is reachable, Master is not reachable from dbsrv3. OK.
Tue Jul 14 16:44:29 2015 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Jul 14 16:44:32 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at ‘reading initial communication packet‘, system error: 111)
Tue Jul 14 16:44:32 2015 - [warning] Connection failed 2 time(s)..
Tue Jul 14 16:44:35 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at ‘reading initial communication packet‘, system error: 111)
Tue Jul 14 16:44:35 2015 - [warning] Connection failed 3 time(s)..
Tue Jul 14 16:44:38 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at ‘reading initial communication packet‘, system error: 111)
Tue Jul 14 16:44:38 2015 - [warning] Connection failed 4 time(s)..
Tue Jul 14 16:44:38 2015 - [warning] Master is not reachable from health checker!
Tue Jul 14 16:44:38 2015 - [warning] Master dbsrv1(10.1.1.102:3306) is not reachable!
Tue Jul 14 16:44:38 2015 - [warning] SSH is reachable.
Tue Jul 14 16:44:38 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jul 14 16:44:38 2015 - [info] Reading default configuration from /etc/masterha_default.cnf..
Tue Jul 14 16:44:38 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Tue Jul 14 16:44:38 2015 - [info] Reading server configuration from /etc/mha/app1.cnf..
Tue Jul 14 16:44:38 2015 - [info] GTID failover mode = 1
Tue Jul 14 16:44:38 2015 - [info] Dead Servers:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv1(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info] Alive Servers:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)
Tue Jul 14 16:44:38 2015 - [info] Alive Slaves:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jul 14 16:44:38 2015 - [info] Checking slave configurations..
Tue Jul 14 16:44:38 2015 - [info]  read_only=1 is not set on slave dbsrv2(10.1.1.107:3306).
Tue Jul 14 16:44:38 2015 - [info]  read_only=1 is not set on slave dbsrv3(10.1.1.108:3306).
Tue Jul 14 16:44:38 2015 - [info] Checking replication filtering settings..
Tue Jul 14 16:44:38 2015 - [info]  Replication filtering check ok.
Tue Jul 14 16:44:38 2015 - [info] Master is down!
Tue Jul 14 16:44:38 2015 - [info] Terminating monitoring script.
Tue Jul 14 16:44:38 2015 - [info] Got exit code 20 (Master dead).
Tue Jul 14 16:44:38 2015 - [info] MHA::MasterFailover version 0.56.
Tue Jul 14 16:44:38 2015 - [info] Starting master failover.
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] * Phase 1: Configuration Check Phase..
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] GTID failover mode = 1
Tue Jul 14 16:44:38 2015 - [info] Dead Servers:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv1(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info] Checking master reachability via MySQL(double check)...
Tue Jul 14 16:44:38 2015 - [info]  ok.
Tue Jul 14 16:44:38 2015 - [info] Alive Servers:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)
Tue Jul 14 16:44:38 2015 - [info] Alive Slaves:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jul 14 16:44:38 2015 - [info] Starting GTID based failover.
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jul 14 16:44:38 2015 - [info] Executing master IP deactivation script:
Tue Jul 14 16:44:38 2015 - [info]   /etc/mha/master_ip_failover --orig_master_host=dbsrv1 --orig_master_ip=10.1.1.102 --orig_master_port=3306 --command=stopssh --ssh_user=root

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Disabling the VIP on old master: dbsrv1
Tue Jul 14 16:44:38 2015 - [info]  done.
Tue Jul 14 16:44:38 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jul 14 16:44:38 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] * Phase 3: Master Recovery Phase..
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] The latest binary log file/position on all slaves is mysqlmaster-bin.000006:191
Tue Jul 14 16:44:38 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jul 14 16:44:38 2015 - [info] The oldest binary log file/position on all slaves is mysqlmaster-bin.000006:191
Tue Jul 14 16:44:38 2015 - [info] Oldest slaves:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] Searching new master from slaves..
Tue Jul 14 16:44:38 2015 - [info]  Candidate masters from the configuration file:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv3(10.1.1.108:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Jul 14 16:44:38 2015 - [info]  Non-candidate masters:
Tue Jul 14 16:44:38 2015 - [info]   dbsrv2(10.1.1.107:3306)  Version=5.6.19-log (oldest major version between slaves) log-bin:enabled
Tue Jul 14 16:44:38 2015 - [info]     GTID ON
Tue Jul 14 16:44:38 2015 - [info]     Replicating from 10.1.1.102(10.1.1.102:3306)
Tue Jul 14 16:44:38 2015 - [info]     Not candidate for the new Master (no_master is set)
Tue Jul 14 16:44:38 2015 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Jul 14 16:44:38 2015 - [info] New master is dbsrv3(10.1.1.108:3306)
Tue Jul 14 16:44:38 2015 - [info] Starting master failover..
Tue Jul 14 16:44:38 2015 - [info]
From:
dbsrv1(10.1.1.102:3306) (current master)
 +--dbsrv2(10.1.1.107:3306)
 +--dbsrv3(10.1.1.108:3306)

To:
dbsrv3(10.1.1.108:3306) (new master)
 +--dbsrv2(10.1.1.107:3306)
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info] * Phase 3.3: New Master Recovery Phase..
Tue Jul 14 16:44:38 2015 - [info]
Tue Jul 14 16:44:38 2015 - [info]  Waiting all logs to be applied..
Tue Jul 14 16:44:38 2015 - [info]   done.
Tue Jul 14 16:44:38 2015 - [info]  Replicating from the latest slave dbsrv2(10.1.1.107:3306) and waiting to apply..
Tue Jul 14 16:44:38 2015 - [info]  Waiting all logs to be applied on the latest slave..
Tue Jul 14 16:44:38 2015 - [info]  Resetting slave dbsrv3(10.1.1.108:3306) and starting replication from the new master dbsrv2(10.1.1.107:3306)..
Tue Jul 14 16:44:39 2015 - [info]  Executed CHANGE MASTER.
Tue Jul 14 16:44:39 2015 - [info]  Slave started.
Tue Jul 14 16:44:39 2015 - [info]  Waiting to execute all relay logs on dbsrv3(10.1.1.108:3306)..
Tue Jul 14 16:44:39 2015 - [info]  master_pos_wait(mysqlslave-bin.000001:634) completed on dbsrv3(10.1.1.108:3306). Executed 3 events.
Tue Jul 14 16:44:39 2015 - [info]   done.
Tue Jul 14 16:44:39 2015 - [info]   done.
Tue Jul 14 16:44:39 2015 - [info] Getting new master‘s binlog name and position..
Tue Jul 14 16:44:39 2015 - [info]  mysqlslave-bin.000001:634
Tue Jul 14 16:44:39 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘dbsrv3 or 10.1.1.108‘, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=‘repluser‘, MASTER_PASSWORD=‘xxx‘;
Tue Jul 14 16:44:39 2015 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysqlslave-bin.000001, 634, 8c71815b-116f-11e4-b9e2-0050569f2c2d:1-2
Tue Jul 14 16:44:39 2015 - [info] Executing master IP activate script:
Tue Jul 14 16:44:39 2015 - [info]   /etc/mha/master_ip_failover --command=start --ssh_user=root --orig_master_host=dbsrv1 --orig_master_ip=10.1.1.102 --orig_master_port=3306 --new_master_host=dbsrv3 --new_master_ip=10.1.1.108 --new_master_port=3306 --new_master_user=‘repluser‘ --new_master_password=‘replpass‘

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Enabling the VIP - 10.1.1.110 on the new master - dbsrv3
Tue Jul 14 16:44:40 2015 - [info]  OK.
Tue Jul 14 16:44:40 2015 - [info] ** Finished master recovery successfully.
Tue Jul 14 16:44:40 2015 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info] * Phase 4.1: Starting Slaves in parallel..
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info] -- Slave recovery on host dbsrv2(10.1.1.107:3306) started, pid: 16698. Check tmp log /var/log/masterha/app1/dbsrv2_3306_20150714164438.log if it takes time..
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info] Log messages from dbsrv2 ...
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info]  Resetting slave dbsrv2(10.1.1.107:3306) and starting replication from the new master dbsrv3(10.1.1.108:3306)..
Tue Jul 14 16:44:40 2015 - [info]  Executed CHANGE MASTER.
Tue Jul 14 16:44:40 2015 - [info]  Slave started.
Tue Jul 14 16:44:40 2015 - [info]  gtid_wait(8c71815b-116f-11e4-b9e2-0050569f2c2d:1-2) completed on dbsrv2(10.1.1.107:3306). Executed 0 events.
Tue Jul 14 16:44:40 2015 - [info] End of log messages from dbsrv2.
Tue Jul 14 16:44:40 2015 - [info] -- Slave on host dbsrv2(10.1.1.107:3306) started.
Tue Jul 14 16:44:40 2015 - [info] All new slave servers recovered successfully.
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info] * Phase 5: New master cleanup phase..
Tue Jul 14 16:44:40 2015 - [info]
Tue Jul 14 16:44:40 2015 - [info] Resetting slave info on the new master..
Tue Jul 14 16:44:41 2015 - [info]  dbsrv3: Resetting slave info succeeded.
Tue Jul 14 16:44:41 2015 - [info] Master failover to dbsrv3(10.1.1.108:3306) completed successfully.
Tue Jul 14 16:44:41 2015 - [info]

----- Failover Report -----

app1: MySQL Master failover dbsrv1(10.1.1.102:3306) to dbsrv3(10.1.1.108:3306) succeeded

Master dbsrv1(10.1.1.102:3306) is down!

JDBC 客户端输出

2015-07-14 17:04:33 user.count:36
still run.
2015-07-14 17:04:38 user.count:36
still run.
2015-07-14 17:04:43 user.count:36
still run.
2015-07-14 17:04:48 user.count:36
still run.
2015-07-14 17:04:53 user.count:36
still run.
2015-07-14 17:04:58 user.count:36
still run.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 5,005 milliseconds ago.  The last packet sent successfully to the server was 3 milliseconds ago.
conn is : [email protected]
getConnection().
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
 at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1127)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4155)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2832)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2781)
 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
 at com.shenli.java.MySqlFailOverTester.main(MySqlFailOverTester.java:72)
Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:209)
 at java.net.SocketInputStream.read(SocketInputStream.java:141)
 at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:112)
 at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:159)
 at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:187)
 at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3158)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
 ... 8 more
conn : [email protected]
still run.
2015-07-14 17:05:08 user.count:36
still run.
2015-07-14 17:05:13 user.count:36
still run.
2015-07-14 17:05:18 user.count:36
still run.
2015-07-14 17:05:23 user.count:36
still run.
2015-07-14 17:05:28 user.count:36
still run.
2015-07-14 17:05:33 user.count:36
still run.
2015-07-14 17:05:38 user.count:36

经过测试使用Tocmat 链接池，也可以自动回复链接，切换大概会花费10秒。

还有个测试就是半同步日志的测试。后面会测试。

时间： 2024-08-05 04:52:29

MySql 的 MHA 配置

MySql 的 MHA 配置的相关文章

MySQL MHA配置

MySQL MHA配置常见问题

MySQL主从复制: MHA

MySQL数据库——MHA高可用集群架构（实战！！！）

mysql多实例配置

MySQL双主配置

mysql主从数据库配置

MySql 集群配置

mysql 免安装配置及远程访问