13、mha高可用架构搭建

各节点架构:

192.168.1.20(mysql5.5) master主库
192.168.1.21(mysql5.5) slave1,目标:主库宕可提升为主库
192.168.1.22(mysql5.5) slave2,目标:主库宕不可提升为主库
192.168.1.25(percona5.6) slave3、mha-manager、binlog server,目标:主库宕不可提升为主库

配置各节点ssh信任,在其中一台执行:

# cd ~/.ssh
# cat id_rsa.pub > authorized_keys
# chmod 600 *
# scp -r /root/.ssh 192.168.1.20:~/
# scp -r /root/.ssh 192.168.1.21:~/
# scp -r /root/.ssh 192.168.1.22:~/
(注意目标文件权限应为600)
# ssh 192.168.1.20  完成测试

192.168.1.25上binlog server启动:(5.6版本后才有)

[[email protected] /]# /data/mysql/percona_3309/master_binlog    --用于后面配置binlog的接收目录
[[email protected] /]# mysqlbinlog -R --host=192.168.1.20 --user=root --password=root --raw --stop-never mysql-bin.000001 &
[1] 6777

mha manager节点安装(25):    --后两个包需要先配置epel网络源才能安装

yum install perl-DBD-MySQL
yum install perl-Config-Tiny
yum install perl-Log-Dispatch
yum install perl-Parallel-ForkManager
[[email protected] ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
[[email protected] ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm 

mha node节点安装(20 、21 、22):

yum install perl-DBD-MySQL
[[email protected] ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm 

mha-master机器配置(25):

  1. [[email protected] master_binlog]# cat /etc/masterha_default.cnf
    [server default]
    user=root
    password=root
    ssh_user=root
    repl_user=slave
    repl_password=slave
    ping_interval=1
    shutdown_script=""
  1. [[email protected] master_binlog]# cat /etc/app1.cnf
    [server default]
    manager_workdir=/var/log/masterha/app1
    manager_log=/var/log/masterha/app1/app1.log
    remote_workdir=/var/log/masterha/app1
    [server1]
    hostname=192.168.1.20
    master_binlog_dir=/mysql/data/
    candidate_master=1
    check_repl_delay=0
    [server2]
    hostname=192.168.1.21
    master_binlog_dir=/mysql/data/
    candidate_master=1
    check_repl_delay=0
    [server3]
    hostname=192.168.1.22
    master_binlog_dir=/mysql/data/
    no_master=1
    ignore_fail=1
    [server4]
    hostname=192.168.1.25
    master_binlog_dir=/data/mysql/user_3306/data/
    no_master=1
    ignore_fail=1
    [binlog1]
    hostname=192.168.1.25
    master_binlog_dir=/data/mysql/percona_3309/master_binlog
    no_master=1
    ignore_fail=1

master节点做信任检查、环境检查

masterha_check_repl遇见的几处故障:

[[email protected] ~]# masterha_check_repl --conf=/etc/app1.cnf
ThuJul3100:25:482014-[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781]Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead!Check configurations for details.Master configurations are as below:
Master192.168.1.20(192.168.1.20:3306), replicating from 192.168.1.21(192.168.1.21:3306)
Master192.168.1.21(192.168.1.21:3306), replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3100:25:482014-[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424]Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
ThuJul3100:25:482014-[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523]Error happened on monitoring servers.
ThuJul3100:25:482014-[info]Got exit code 1(Not master dead).
MySQLReplicationHealth is NOT OK!

处理办法:日志报说有多个主,经过检查,发现20和21为主主,关闭20的slave。
Can‘t exec "mysqlbinlog":没有那个文件或目录 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options
 at /usr/bin/apply_diff_relay_logs line 493
  1. 处理办法:
  2. 在所有节点上执行
  3. which mysqlbinlog;    --/mysql/bin/mysqlbinlog
  4. ln -s /mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ThuJul3100:56:012014-[info]   Connecting to [email protected](192.168.1.21:22)..
Creating directory /var/log/masterha/app1.. done.
  Checking slave recovery environment settings..
    Opening/mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to likun1-relay-bin.143287
    Temporary relay log file is /mysql/data/likun1-relay-bin.143287
    Testing mysql connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!
 at /usr/bin/apply_diff_relay_logs line 375
        main::check() called at /usr/bin/apply_diff_relay_logs line 497
        eval {...} called at /usr/bin/apply_diff_relay_logs line 475
        main::main() called at /usr/bin/apply_diff_relay_logs line 120
解决办法:跟上面一样   ln -s `which mysql`/usr/bin/mysql
ThuJul3101:07:032014-[info]   Connecting to [email protected](192.168.1.21:22)..
  Checking slave recovery environment settings..
    Opening/mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to likun1-relay-bin.164206
    Temporary relay log file is /mysql/data/likun1-relay-bin.164206
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output..mysqlbinlog:File‘/mysql/data/likun1-relay-bin.164206‘ not found (Errcode:2)
mysqlbinlog failed with rc 1:0!
解决办法:多个从之间server-id重复,导致从库大量重连,产生大量relay-bin-log,由于relay_log_purge=ON,验证时文件不存在。

masterha_check_repl正确的完整输出:

[[email protected] bin]# masterha_check_repl --conf=/etc/app1.cnf
ThuJul3101:35:322014-[info]Readingdefault configuration from /etc/masterha_default.cnf..
ThuJul3101:35:322014-[info]Reading application default configuration from /etc/app1.cnf..
ThuJul3101:35:322014-[info]Reading server configuration from /etc/app1.cnf..
ThuJul3101:35:322014-[info] MHA::MasterMonitor version 0.56.
ThuJul3101:35:332014-[info] GTID failover mode =0
ThuJul3101:35:332014-[info]DeadServers:
ThuJul3101:35:332014-[info]AliveServers:
ThuJul3101:35:332014-[info]   192.168.1.20(192.168.1.20:3306)
ThuJul3101:35:332014-[info]   192.168.1.21(192.168.1.21:3306)
ThuJul3101:35:332014-[info]   192.168.1.22(192.168.1.22:3306)
ThuJul3101:35:332014-[info]   192.168.1.25(192.168.1.25:3306)
ThuJul3101:35:332014-[info]AliveSlaves:
ThuJul3101:35:332014-[info]   192.168.1.21(192.168.1.21:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:35:332014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:35:332014-[info]     Primary candidate for the newMaster(candidate_master is set)
ThuJul3101:35:332014-[info]   192.168.1.22(192.168.1.22:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:35:332014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:35:332014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:35:332014-[info]   192.168.1.25(192.168.1.25:3306)  Version=5.5.37-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:35:332014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:35:332014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:35:332014-[info]CurrentAliveMaster:192.168.1.20(192.168.1.20:3306)
ThuJul3101:35:332014-[info]Checking slave configurations..
ThuJul3101:35:332014-[info]  read_only=1 is not set on slave 192.168.1.21(192.168.1.21:3306).
ThuJul3101:35:332014-[warning]  relay_log_purge=0 is not set on slave 192.168.1.21(192.168.1.21:3306).
ThuJul3101:35:332014-[info]  read_only=1 is not set on slave 192.168.1.22(192.168.1.22:3306).
ThuJul3101:35:332014-[warning]  relay_log_purge=0 is not set on slave 192.168.1.22(192.168.1.22:3306).
ThuJul3101:35:332014-[info]  read_only=1 is not set on slave 192.168.1.25(192.168.1.25:3306).
ThuJul3101:35:332014-[warning]  relay_log_purge=0 is not set on slave 192.168.1.25(192.168.1.25:3306).
ThuJul3101:35:332014-[info]Checking replication filtering settings..
ThuJul3101:35:332014-[info]  binlog_do_db=, binlog_ignore_db=
ThuJul3101:35:332014-[info]  Replication filtering check ok.
ThuJul3101:35:332014-[info] GTID (with auto-pos) is not supported
ThuJul3101:35:332014-[info]Starting SSH connection tests..
ThuJul3101:35:562014-[info]All SSH connection tests passed successfully.
ThuJul3101:35:562014-[info]Checking MHA Node version..
ThuJul3101:36:042014-[info]  Version check ok.
ThuJul3101:36:042014-[info]Checking SSH publickey authentication settings on the current master..
ThuJul3101:36:062014-[info]HealthCheck: SSH to 192.168.1.20 is reachable.
ThuJul3101:36:092014-[info]Master MHA Node version is 0.56.
ThuJul3101:36:092014-[info]Checking recovery script configurations on 192.168.1.20(192.168.1.20:3306)..
ThuJul3101:36:092014-[info]   Executing command: save_binary_logs --command=test --start_pos=4--binlog_dir=/mysql/data/--output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56--start_file=mysql-bin.000017
ThuJul3101:36:092014-[info]   Connecting to [email protected](192.168.1.20:22)..
  Creating/var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /mysql/data/, up to mysql-bin.000017
ThuJul3101:36:112014-[info]Binlog setting check done.
ThuJul3101:36:112014-[info]Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
ThuJul3101:36:112014-[info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘--slave_host=192.168.1.21--slave_ip=192.168.1.21--slave_port=3306--workdir=/var/log/masterha/app1 --target_version=5.5.30-log --manager_version=0.56--relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
ThuJul3101:36:112014-[info]   Connecting to [email protected](192.168.1.21:22)..
  Checking slave recovery environment settings..
    Opening/mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to likun1-relay-bin.197850
    Temporary relay log file is /mysql/data/likun1-relay-bin.197850
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
ThuJul3101:36:142014-[info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘--slave_host=192.168.1.22--slave_ip=192.168.1.22--slave_port=3306--workdir=/var/log/masterha/app1 --target_version=5.5.30-log --manager_version=0.56--relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
ThuJul3101:36:142014-[info]   Connecting to [email protected]2.168.1.22(192.168.1.22:22)..
Creating directory /var/log/masterha/app1.. done.
  Checking slave recovery environment settings..
    Opening/mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to likun1-relay-bin.197850
    Temporary relay log file is /mysql/data/likun1-relay-bin.197850
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
ThuJul3101:36:172014-[info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘--slave_host=192.168.1.25--slave_ip=192.168.1.25--slave_port=3306--workdir=/var/log/masterha/app1 --target_version=5.5.37-log --manager_version=0.56--relay_log_info=/data/mysql/user_3306/data/relay-log.info  --relay_dir=/data/mysql/user_3306/data/  --slave_pass=xxx
ThuJul3101:36:172014-[info]   Connecting to [email protected](192.168.1.25:22)..
  Checking slave recovery environment settings..
    Opening/data/mysql/user_3306/data/relay-log.info ... ok.
    Relay log found at /data/mysql/user_3306/data, up to mysql1-relay-bin.000026
    Temporary relay log file is /data/mysql/user_3306/data/mysql1-relay-bin.000026
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
ThuJul3101:36:212014-[info]Slaves settings check done.
ThuJul3101:36:212014-[info]
192.168.1.20(192.168.1.20:3306)(current master)
 +--192.168.1.21(192.168.1.21:3306)
 +--192.168.1.22(192.168.1.22:3306)
 +--192.168.1.25(192.168.1.25:3306)
ThuJul3101:36:212014-[info]Checking replication health on 192.168.1.21..
ThuJul3101:36:212014-[info]  ok.
ThuJul3101:36:212014-[info]Checking replication health on 192.168.1.22..
ThuJul3101:36:212014-[info]  ok.
ThuJul3101:36:212014-[info]Checking replication health on 192.168.1.25..
ThuJul3101:36:212014-[info]  ok.
ThuJul3101:36:212014-[warning] master_ip_failover_script is not defined.
ThuJul3101:36:212014-[warning] shutdown_script is not defined.
ThuJul3101:36:212014-[info]Got exit code 0(Not master dead).
MySQLReplicationHealth is OK.

masterha_check_ssh正确的完整输出:

[[email protected] bin]# masterha_check_ssh -conf=/etc/app1.cnf
ThuJul3101:47:522014-[info]Readingdefault configuration from /etc/masterha_default.cnf..
ThuJul3101:47:522014-[info]Reading application default configuration from /etc/app1.cnf..
ThuJul3101:47:522014-[info]Reading server configuration from /etc/app1.cnf..
ThuJul3101:47:522014-[info]Starting SSH connection tests..
ThuJul3101:48:002014-[debug]
ThuJul3101:47:522014-[debug]  Connecting via SSH from [email protected](192.168.1.21:22) to [email protected](192.168.1.20:22)..
ThuJul3101:47:542014-[debug]   ok.
ThuJul3101:47:542014-[debug]  Connecting via SSH from [email protected](192.168.1.21:22) to [email protected](192.168.1.22:22)..
ThuJul3101:47:572014-[debug]   ok.
ThuJul3101:47:572014-[debug]  Connecting via SSH from [email protected](192.168.1.21:22) to [email protected](192.168.1.25:22)..
ThuJul3101:48:002014-[debug]   ok.
ThuJul3101:48:002014-[debug]
ThuJul3101:47:532014-[debug]  Connecting via SSH from [email protected](192.168.1.22:22) to [email protected](192.168.1.20:22)..
ThuJul3101:47:552014-[debug]   ok.
ThuJul3101:47:552014-[debug]  Connecting via SSH from [email protected](192.168.1.22:22) to [email protected](192.168.1.21:22)..
ThuJul3101:47:582014-[debug]   ok.
ThuJul3101:47:582014-[debug]  Connecting via SSH from [email protected](192.168.1.22:22) to [email protected](192.168.1.25:22)..
ThuJul3101:48:002014-[debug]   ok.
ThuJul3101:48:052014-[debug]
ThuJul3101:47:522014-[debug]  Connecting via SSH from [email protected](192.168.1.20:22) to [email protected](192.168.1.21:22)..
Address192.168.1.21 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
ThuJul3101:47:542014-[debug]   ok.
ThuJul3101:47:542014-[debug]  Connecting via SSH from [email protected](192.168.1.20:22) to [email protected](192.168.1.22:22)..
Address192.168.1.22 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
ThuJul3101:47:572014-[debug]   ok.
ThuJul3101:47:572014-[debug]  Connecting via SSH from [email protected](192.168.1.20:22) to [email protected](192.168.1.25:22)..
Address192.168.1.25 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
ThuJul3101:48:052014-[debug]   ok.
ThuJul3101:48:062014-[debug]
ThuJul3101:47:532014-[debug]  Connecting via SSH from [email protected](192.168.1.25:22) to [email protected](192.168.1.20:22)..
ThuJul3101:47:572014-[debug]   ok.
ThuJul3101:47:572014-[debug]  Connecting via SSH from [email protected](192.168.1.25:22) to [email protected](192.168.1.21:22)..
ThuJul3101:48:022014-[debug]   ok.
ThuJul3101:48:022014-[debug]  Connecting via SSH from [email protected](192.168.1.25:22) to [email protected](192.168.1.22:22)..
ThuJul3101:48:062014-[debug]   ok.
ThuJul3101:48:062014-[info]All SSH connection tests passed successfully.

启动mha master:

[[email protected] bin]# nohup masterha_manager --conf=/etc/app1.cnf  > /tmp/mha_manager.log 2>&1  &

[2] 6389

检查mha master运行状态

[[email protected] master_binlog]# masterha_check_status --conf=/etc/app1.cnf

app1 (pid:6389) is running(0:PING_OK), master:192.168.1.20

停止mha

[[email protected] master_binlog]# masterha_stop --conf=/etc/app1.cnf

Stopped app1 successfully.

到此mha搭建完成!!!

主库fail over 测试:

20机器停止mysql运行,tail -f /var/log/masterha/app1/app1.log 查看 mha处理过程,观察主切换到哪里,还有change master语句也会打印出来。

ThuJul3101:58:482014-[warning]Got error on MySQL select ping:2006(MySQL server has gone away)
ThuJul3101:58:482014-[info]Executing SSH check script: save_binary_logs --command=test --start_pos=4--binlog_dir=/mysql/data/--output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56--binlog_prefix=mysql-bin
ThuJul3101:58:492014-[warning]Got error on MySQL connect:2013(Lost connection to MySQL server at ‘reading initial communication packet‘, system error:111)
ThuJul3101:58:492014-[warning]Connection failed 2 time(s)..
ThuJul3101:58:502014-[warning]Got error on MySQL connect:2013(Lost connection to MySQL server at ‘reading initial communication packet‘, system error:111)
ThuJul3101:58:502014-[warning]Connection failed 3 time(s)..
ThuJul3101:58:512014-[warning]Got error on MySQL connect:2013(Lost connection to MySQL server at ‘reading initial communication packet‘, system error:111)
ThuJul3101:58:512014-[warning]Connection failed 4 time(s)..
ThuJul3101:58:532014-[warning]HealthCheck:Got timeout on checking SSH connection to 192.168.1.20! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
ThuJul3101:58:532014-[warning]Master is not reachable from health checker!
ThuJul3101:58:532014-[warning]Master192.168.1.20(192.168.1.20:3306) is not reachable!
ThuJul3101:58:532014-[warning] SSH is NOT reachable.
ThuJul3101:58:532014-[info]Connecting to a master server failed.Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
ThuJul3101:58:532014-[info]Readingdefault configuration from /etc/masterha_default.cnf..
ThuJul3101:58:532014-[info]Reading application default configuration from /etc/app1.cnf..
ThuJul3101:58:532014-[info]Reading server configuration from /etc/app1.cnf..
ThuJul3101:58:542014-[info] GTID failover mode =0
ThuJul3101:58:542014-[info]DeadServers:
ThuJul3101:58:542014-[info]   192.168.1.20(192.168.1.20:3306)
ThuJul3101:58:542014-[info]AliveServers:
ThuJul3101:58:542014-[info]   192.168.1.21(192.168.1.21:3306)
ThuJul3101:58:542014-[info]   192.168.1.22(192.168.1.22:3306)
ThuJul3101:58:542014-[info]   192.168.1.25(192.168.1.25:3306)
ThuJul3101:58:542014-[info]AliveSlaves:
ThuJul3101:58:542014-[info]   192.168.1.21(192.168.1.21:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:58:542014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:58:542014-[info]     Primary candidate for the newMaster(candidate_master is set)
ThuJul3101:58:542014-[info]   192.168.1.22(192.168.1.22:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:58:542014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:58:542014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:58:542014-[info]   192.168.1.25(192.168.1.25:3306)  Version=5.5.37-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:58:542014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:58:542014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:58:542014-[info]Checking slave configurations..
ThuJul3101:58:542014-[info]  read_only=1 is not set on slave 192.168.1.21(192.168.1.21:3306).
ThuJul3101:58:542014-[warning]  relay_log_purge=0 is not set on slave 192.168.1.21(192.168.1.21:3306).
ThuJul3101:58:542014-[info]  read_only=1 is not set on slave 192.168.1.22(192.168.1.22:3306).
ThuJul3101:58:542014-[warning]  relay_log_purge=0 is not set on slave 192.168.1.22(192.168.1.22:3306).
ThuJul3101:58:542014-[info]  read_only=1 is not set on slave 192.168.1.25(192.168.1.25:3306).
ThuJul3101:58:542014-[warning]  relay_log_purge=0 is not set on slave 192.168.1.25(192.168.1.25:3306).
ThuJul3101:58:542014-[info]Checking replication filtering settings..
ThuJul3101:58:542014-[info]  Replication filtering check ok.
ThuJul3101:58:542014-[info]Master is down!
ThuJul3101:58:542014-[info]Terminating monitoring script.
ThuJul3101:58:542014-[info]Got exit code 20(Master dead).
ThuJul3101:58:542014-[info] MHA::MasterFailover version 0.56.
ThuJul3101:58:542014-[info]Starting master failover.
ThuJul3101:58:542014-[info]
ThuJul3101:58:542014-[info]*Phase1:ConfigurationCheckPhase..
ThuJul3101:58:542014-[info]
ThuJul3101:58:582014-[info]HealthCheck: SSH to 192.168.1.25 is reachable.
ThuJul3101:59:022014-[info]Binlog server 192.168.1.25 is reachable.
ThuJul3101:59:032014-[info] GTID failover mode =0
ThuJul3101:59:032014-[info]DeadServers:
ThuJul3101:59:032014-[info]   192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]Checking master reachability via MySQL(double check)...
ThuJul3101:59:032014-[info]  ok.
ThuJul3101:59:032014-[info]AliveServers:
ThuJul3101:59:032014-[info]   192.168.1.21(192.168.1.21:3306)
ThuJul3101:59:032014-[info]   192.168.1.22(192.168.1.22:3306)
ThuJul3101:59:032014-[info]   192.168.1.25(192.168.1.25:3306)
ThuJul3101:59:032014-[info]AliveSlaves:
ThuJul3101:59:032014-[info]   192.168.1.21(192.168.1.21:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Primary candidate for the newMaster(candidate_master is set)
ThuJul3101:59:032014-[info]   192.168.1.22(192.168.1.22:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]   192.168.1.25(192.168.1.25:3306)  Version=5.5.37-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]StartingNon-GTID based failover.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]**Phase1:ConfigurationCheckPhase completed.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase2:DeadMasterShutdownPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Forcing shutdown so that applications never connect to the current master..
ThuJul3101:59:032014-[warning] master_ip_failover_script is not set.Skipping invalidating dead master IP address.
ThuJul3101:59:032014-[warning] shutdown_script is not set.Skippingexplicit shutting down of the dead master.
ThuJul3101:59:032014-[info]*Phase2:DeadMasterShutdownPhase completed.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase3:MasterRecoveryPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase3.1:GettingLatestSlavesPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]The latest binary log file/position on all slaves is mysql-bin.000017:486
ThuJul3101:59:032014-[info]Latest slaves (Slaves that received relay log files to the latest):
ThuJul3101:59:032014-[info]   192.168.1.21(192.168.1.21:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Primary candidate for the newMaster(candidate_master is set)
ThuJul3101:59:032014-[info]   192.168.1.22(192.168.1.22:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]   192.168.1.25(192.168.1.25:3306)  Version=5.5.37-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]The oldest binary log file/position on all slaves is mysql-bin.000017:486
ThuJul3101:59:032014-[info]Oldest slaves:
ThuJul3101:59:032014-[info]   192.168.1.21(192.168.1.21:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Primary candidate for the newMaster(candidate_master is set)
ThuJul3101:59:032014-[info]   192.168.1.22(192.168.1.22:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]   192.168.1.25(192.168.1.25:3306)  Version=5.5.37-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase3.2:SavingDeadMaster‘s BinlogPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[warning]DeadMaster is not SSH reachable.Could not save it‘s binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master‘s binlog) were lost.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase3.3:DeterminingNewMasterPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Finding the latest slave that has all relay logs for recovering other slaves..
ThuJul3101:59:032014-[info]All slaves received relay logs to the same position.No need to resync each other.
ThuJul3101:59:032014-[info]Searchingnew master from slaves..
ThuJul3101:59:032014-[info]  Candidate masters from the configuration file:
ThuJul3101:59:032014-[info]   192.168.1.21(192.168.1.21:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Primary candidate for the newMaster(candidate_master is set)
ThuJul3101:59:032014-[info]  Non-candidate masters:
ThuJul3101:59:032014-[info]   192.168.1.22(192.168.1.22:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]   192.168.1.25(192.168.1.25:3306)  Version=5.5.37-log (oldest major version between slaves) log-bin:enabled
ThuJul3101:59:032014-[info]     Replicating from 192.168.1.20(192.168.1.20:3306)
ThuJul3101:59:032014-[info]     Not candidate for the newMaster(no_master is set)
ThuJul3101:59:032014-[info]  Searching from candidate_master slaves which have received the latest relay log events..
ThuJul3101:59:032014-[info]New master is 192.168.1.21(192.168.1.21:3306)
ThuJul3101:59:032014-[info]Starting master failover..
ThuJul3101:59:032014-[info]
From:
192.168.1.20(192.168.1.20:3306)(current master)
 +--192.168.1.21(192.168.1.21:3306)
 +--192.168.1.22(192.168.1.22:3306)
 +--192.168.1.25(192.168.1.25:3306)
To:
192.168.1.21(192.168.1.21:3306)(new master)
 +--192.168.1.22(192.168.1.22:3306)
 +--192.168.1.25(192.168.1.25:3306)
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase3.3:NewMasterDiffLogGenerationPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]  This server has all relay logs.No need to generate diff files from the latest slave.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase3.4:MasterLogApplyPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*NOTICE:If any error happens from this phase, manual recovery is needed.
ThuJul3101:59:032014-[info]Starting recovery on 192.168.1.21(192.168.1.21:3306)..
ThuJul3101:59:032014-[info]  This server has all relay logs.Waiting all logs to be applied..
ThuJul3101:59:032014-[info]   done.
ThuJul3101:59:032014-[info]  All relay logs were successfully applied.
ThuJul3101:59:032014-[info]Gettingnew master‘s binlog name and position..
ThuJul3101:59:032014-[info]  mysql-bin.000011:569
ThuJul3101:59:032014-[info]  All other slaves should start replication from here.Statement should be: CHANGE MASTER TO MASTER_HOST=‘192.168.1.21‘, MASTER_PORT=3306, MASTER_LOG_FILE=‘mysql-bin.000011‘, MASTER_LOG_POS=569, MASTER_USER=‘slave‘, MASTER_PASSWORD=‘xxx‘;
ThuJul3101:59:032014-[warning] master_ip_failover_script is not set.Skipping taking over new master IP address.
ThuJul3101:59:032014-[info]**Finished master recovery successfully.
ThuJul3101:59:032014-[info]*Phase3:MasterRecoveryPhase completed.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase4:SlavesRecoveryPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase4.1:StartingParallelSlaveDiffLogGenerationPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]--Slave diff file generation on host 192.168.1.22(192.168.1.22:3306) started, pid:6708.Check tmp log /var/log/masterha/app1/192.168.1.22_3306_20140731015854.log if it takes time..
ThuJul3101:59:032014-[info]--Slave diff file generation on host 192.168.1.25(192.168.1.25:3306) started, pid:6709.Check tmp log /var/log/masterha/app1/192.168.1.25_3306_20140731015854.log if it takes time..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Log messages from 192.168.1.22...
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]  This server has all relay logs.No need to generate diff files from the latest slave.
ThuJul3101:59:032014-[info]End of log messages from 192.168.1.22.
ThuJul3101:59:032014-[info]--192.168.1.22(192.168.1.22:3306) has the latest relay log events.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Log messages from 192.168.1.25...
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]  This server has all relay logs.No need to generate diff files from the latest slave.
ThuJul3101:59:032014-[info]End of log messages from 192.168.1.25.
ThuJul3101:59:032014-[info]--192.168.1.25(192.168.1.25:3306) has the latest relay log events.
ThuJul3101:59:032014-[info]Generating relay diff files from the latest slave succeeded.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase4.2:StartingParallelSlaveLogApplyPhase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.22(192.168.1.22:3306) started, pid:6712.Check tmp log /var/log/masterha/app1/192.168.1.22_3306_20140731015854.log if it takes time..
ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.25(192.168.1.25:3306) started, pid:6713.Check tmp log /var/log/masterha/app1/192.168.1.25_3306_20140731015854.log if it takes time..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Log messages from 192.168.1.22...
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Starting recovery on 192.168.1.22(192.168.1.22:3306)..
ThuJul3101:59:032014-[info]  This server has all relay logs.Waiting all logs to be applied..
ThuJul3101:59:032014-[info]   done.
ThuJul3101:59:032014-[info]  All relay logs were successfully applied.
ThuJul3101:59:032014-[info]  Resetting slave 192.168.1.22(192.168.1.22:3306) and starting replication from the new master 192.168.1.21(192.168.1.21:3306)..
ThuJul3101:59:032014-[info]  Executed CHANGE MASTER.
ThuJul3101:59:032014-[info]  Slave started.
ThuJul3101:59:032014-[info]End of log messages from 192.168.1.22.
ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.22(192.168.1.22:3306) succeeded.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Log messages from 192.168.1.25...
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Starting recovery on 192.168.1.25(192.168.1.25:3306)..
ThuJul3101:59:032014-[info]  This server has all relay logs.Waiting all logs to be applied..
ThuJul3101:59:032014-[info]   done.
ThuJul3101:59:032014-[info]  All relay logs were successfully applied.
ThuJul3101:59:032014-[info]  Resetting slave 192.168.1.25(192.168.1.25:3306) and starting replication from the new master 192.168.1.21(192.168.1.21:3306)..
ThuJul3101:59:032014-[info]  Executed CHANGE MASTER.
ThuJul3101:59:032014-[info]  Slave started.
ThuJul3101:59:032014-[info]End of log messages from 192.168.1.25.
ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.25(192.168.1.25:3306) succeeded.
ThuJul3101:59:032014-[info]Allnew slave servers recovered successfully.
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]*Phase5:New master cleanup phase..
ThuJul3101:59:032014-[info]
ThuJul3101:59:032014-[info]Resetting slave info on the new master..
ThuJul3101:59:032014-[info]  192.168.1.21:Resetting slave info succeeded.
ThuJul3101:59:032014-[info]Master failover to 192.168.1.21(192.168.1.21:3306) completed successfully.
ThuJul3101:59:032014-[info]
-----FailoverReport-----
app1:MySQLMaster failover 192.168.1.20(192.168.1.20:3306) to 192.168.1.21(192.168.1.21:3306) succeeded
Master192.168.1.20(192.168.1.20:3306) is down!
Check MHA Manager logs at mysql1.com:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.1.21(192.168.1.21:3306) has all relay logs for recovery.
Selected192.168.1.21(192.168.1.21:3306) as a new master.
192.168.1.21(192.168.1.21:3306): OK:Applying all logs succeeded.
192.168.1.22(192.168.1.22:3306):This host has the latest relay log events.
192.168.1.25(192.168.1.25:3306):This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.22(192.168.1.22:3306): OK:Applying all logs succeeded.Slave started, replicating from 192.168.1.21(192.168.1.21:3306)
192.168.1.25(192.168.1.25:3306): OK:Applying all logs succeeded.Slave started, replicating from 192.168.1.21(192.168.1.21:3306)
192.168.1.21(192.168.1.21:3306):Resetting slave info succeeded.
Master failover to 192.168.1.21(192.168.1.21:3306) completed successfully.

恢复20机器的mysql,然后执行change master:

CHANGE MASTER TO MASTER_HOST=‘192.168.1.21‘, MASTER_PORT=3306, MASTER_LOG_FILE=‘mysql-bin.000011‘, MASTER_LOG_POS=569, MASTER_USER=‘slave‘, MASTER_PASSWORD=‘slave‘; (在上面日志中有)

start slave;

 

?恢复环境后,要删除/var/log/masterha/app1 下的app1.failover.complete ,并重启mha-master进程。才能下次fail over.

对于ip地址的接管,需要修改这2个脚本:

master_ip_failover_script=‘‘        模板在安装包的sample/scripts下

master_ip_online_change_script=""        手工切换要配置这个脚本,否则会出现只切mysql,没切vip的状况

参考吴总:https://github.com/wubx/mha-helper

一个比较全的博客:http://blog.itpub.net/14594028/viewspace-1073516/

MYSQL + MHA +keepalive + VIP安装配置(一)

http://www.cnblogs.com/yuanermen/p/3726572.html

http://www.cnblogs.com/yuanermen/p/3726961.html

http://www.cnblogs.com/yuanermen/p/3735263.html

时间: 2024-09-30 07:06:37

13、mha高可用架构搭建的相关文章

Mysql_MHA高可用架构搭建

Mysql_MHA高可用架构搭建 窗体顶端 窗体底端 环境及兼容包 系统环境:centos6.5 Mysql: mysql-5.5.6 数据库用源码安装,这里就不介绍了 主机分配: Master : 192.168.0.101 node1 (主库) Slave1 : 192.168.0.102 node2 (备用主库) Slave2 : 192.168.0.103 node3 (从库+MHA控制节点) MHA兼容包见附件 窗体顶端 窗体底端 添加免密码登录,互为认证 ++++++++++以下操作

drbd+heartbeat+nfs高可用架构搭建

一.客户需求 1.需求描述 有些客户有自己的存储设备,但是并没有集群文件系统服务,所以如果我们多个节点(计算节点)如果想同时使用其中的一个块且要保证高可用的话,就需要我们自己来完成类似集群文件系统的服务组合,在此我们使用的服务组合是:iscsi共享+drbd+heartbeat+nfs. 2.服务说明 Iscsi共享:这里通过iscsi共享服务将存储设备上的存储块共享出去,提供节点(NC1+NC2)使用,此处我们将在iscsi服务短创建两个镜像充当块设备. Drbd   :服务器之间镜像块设备内

MHA高可用架构部署配置实例

MHA高可用架构部署配置实例 一.前言 1.1What's MHA?--原理简介 ? MHA--Master High Availability,目前在MySQL高可用方面是一个相对成熟的解决方案,是一套优秀的MySQL故障切换和主从提升的高可用软件. ? 这里我们提到了两个个关键点:"高可用","故障切换".我们逐一简单介绍一下这两者的含义. 1.1.1何为高可用? ? 高可用就是可用性强,在一定条件下(某个服务器出错或宕机)可以保证服务器可以正常运行,在一定程度

mysql mha高可用架构的安装

MMM无法全然地保证数据的一致性,所以MMM适用于对数据的一致性要求不是非常高.可是又想最大程度的保证业务可用性的场景对于那些对数据一致性要求非常高的业务,非常不建议採用MMM的这样的高可用性架构.那么可以考虑使用MHA.在mysql故障切换的过程中.MHA可以在0-30s内自己主动完毕数据库的故障切换操作,而且MHA可以最大程度上保证数据的一致性,以达到真正意义上的高可用. MHA软件由两部分组成,Manager工具包和Node工具包.详细的说明例如以下. Manager工具包主要包含下面几个

基于DR模式的keepalived主从模式高可用架构搭建

一:架构图示 2.keepalived是什么? Keepalived的作用是检测服务器的状态,如果有一台web服务器宕机 ,或工作出现故障,Keepalived将检测到,通过VRRP协议,将有故障的服务器从系统中剔除,同时使用其他服务器代替该服务器的工作,当服务器工作正常后 Keepalived自动将服务器加入到服务器群中,这些工作全部自动完成,不需要人工干涉,需要人工做的只是修复故障的服务器. 3.VRRP协议是什么? VRRP(Virtual Router Redundancy Protoc

MySQL MHA高可用架构介绍

介绍了当前主流高可用软件MHA的工作流程和切换演示(模拟从库延迟,主库宕机后,数据自动补齐) 视频地址:http://edu.51cto.com/lesson/id-44865.html

mysql5.6基于GTID模式之高可用架构搭建-MHA(mha0.56)

一.测试环境部署: mysql1:192.168.110.131   作为master mysql2:192.168.110.132   作为slave mysql3:192.168.110.130   作为slave,同时作为MHA的管理机 虚拟IP:192.168.110.100 二.mysql主从环境搭建和MHA安装 1.mysql主从搭建自行搭建(基于GTID复制,打开log_bin,复制规则默认,复制所有库表),这里不再说明. 2.安装MHA节点软件:rpm -ivh mha4mysq

MySQL MHA高可用环境搭建

一.安装MHA基本环境 1. 安装MHA node (1) 基本环境说明,本文参考互联网文章学习,搭建MHA与测试如下. 参考文档:http://www.cnblogs.com/xuanzhi201111/p/4231412.html 角色                IP地址            主机名    =============================================    Master              192.168.1.121     node

企业中MySQL主流高可用架构实战三部曲之MHA

老张最近两天有些忙,一些老铁一直问,啥时更新博文,我可能做不到天天更新啊,但保证以后一有空就写一些干货知识分享给大家. 我们如果想要做好技术这项工作,一定要做到理论与实践先结合.我一个曾经被数据库虐得体无完肤的过来人给大家一些建议:就是只看书,背理论真的行不通,到时遇到棘手的问题,你还是一样抓瞎.一定要在理论理清的基础上多做实验. 给自己定个目标,3个月做够100-500个实验.然后整理在做实验过程中的各种报错,认真解读分析报错原理,做好笔记.最后再拿起书,重新阅读之前有些可能理解不了的理论知识