一、安装MHA基本环境

1. 安装MHA node

(1) 基本环境说明，本文参考互联网文章学习，搭建MHA与测试如下。

参考文档：http://www.cnblogs.com/xuanzhi201111/p/4231412.html

角色                IP地址            主机名
=============================================
Master              192.168.1.121     node1
Slave               192.168.1.122     node2
Slave               192.168.1.123     node3
Monitor host        192.168.1.125     node5

(2) 在node1,node2,node3,node5操作：

# vi /etc/hosts

192.168.1.121 node1
192.168.1.122 node2
192.168.1.123 node3
192.168.1.125 node5

安装MHA node节点软件包：

# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# yum install perl-DBD-MySQL perl-CPAN -y
# tar xf mha4mysql-node-0.56.tar.gz
# cd mha4mysql-node-0.56
# perl Makefile.PL
# make && make install

2. 安装MHA Manager

在node5管理节点上操作：注：MHA Manager主机也是需要安装MHA Node,MHA Manger

# yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
# tar xf mha4mysql-manager-0.56.tar.gz
# cd mha4mysql-manager-0.56
# perl Makefile.PL
# make && make install

#说明：安装的脚本程序都在/usr/local/bin/目录下。

3. 节点间配置SSH登录无密码验证（MHA主机之间使用key登录）

在node5（Monitor）：

# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

在node1（Master）：

# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

在node2 (slave)：

# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

在node3 (slave)：

# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

二、搭建主从复制环境

1. 主从复制环境配置过程

(1) mysql安装过程略,但是三节点要创建如下链接

node1(主)，node2(主备从）,node3(从)

注意：创建如下链接：
ln -s /usr/local/mysql/bin/* /usr/local/bin/

node1 my.cnf

server-id              = 1
binlog-format          = ROW
log-bin                = master-bin
log-bin-index          = master-bin.index
log-slave-updates      = true
relay_log_purge        = 0

node2 my.cnf

server-id              = 2
binlog-format          = ROW
log-bin                = master-bin
log-bin-index          = master-bin.index
log-slave-updates      = true
relay_log_purge        = 0

node3 my.cnf

binlog-format          = ROW
log-bin                = mysql-bin
relay-log              = slave-relay-bin
relay-log-index        = slave-relay-bin.index
log-slave-updates      = true
server-id              = 11
skip-name-resolve
relay_log_purge        = 0

(2) 在node1 (Master)上备份一份完整的数据：

# mysqldump -uroot -p123456 --master-data=2 --single-transaction -R --triggers -A > all.sql

其中--master-data=2代表备份时刻记录master的Binlog位置和Position。

(3) 在node1 (Master)上创建复制用户:

mysql> grant replication slave on *.* to ‘repl‘@‘192.168.1.%‘ identified by ‘123456‘;
mysql> flush privileges;

(4) 查看主库备份时的binlog名称和位置，MASTER_LOG_FILE和MASTER_LOG_POS：

# head -n 30 all.sql | grep ‘CHANGE MASTER TO‘

-- CHANGE MASTER TO MASTER_LOG_FILE=‘master-bin.000004‘, MASTER_LOG_POS=120;

(5) 把备份复制到192.168.1.122和192.168.1.123

# scp all.sql 192.168.1.122:/root/
# scp all.sql 192.168.1.123:/root/

(6) 分别在两台服务器上导入备份，执行复制相关命令

在node2、node3 主机上操作：

# mysql -uroot -p123456 < all.sql

mysql> stop slave;

CHANGE MASTER TO
MASTER_HOST=‘192.168.1.121‘,
MASTER_USER=‘repl‘,
MASTER_PASSWORD=‘123456‘,
MASTER_LOG_FILE=‘master-bin.000004‘,
MASTER_LOG_POS=120;

mysql> start slave;
mysql> show slave status\G

2. 创建MHA管理用户,在master上创建。

mysql> grant all privileges on *.* to ‘root‘@‘192.168.1.%‘ identified by ‘123456‘;
mysql> flush privileges;

三、配置Keepalived VIP

vip配置可以采用两种方式，一种通过keepalived的方式管理虚拟ip的浮动；一人是通过脚本方式，本文通过keepalived方式实现

1. 在node1(Master)与node2（备选主节点）安装keepalived。

# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
# tar xf keepalived-1.2.12.tar.gz
# cd keepalived-1.2.12
# ./configure --prefix=/usr/local/keepalived
# make && make install
# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
# mkdir /etc/keepalived
# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

2. 配置keepalived的配置文件，在node1(master)上配置操作如下:

注：keepalived配置成backup->backup，即IP地址切换后，主起来后IP地址不切换，本文监控脚本由MHA提供，keepalived不提供对mysqld的监控。

# vi /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
     notification_email {
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    nopreempt

authentication {
    auth_type PASS
    auth_pass 1111
    }

virtual_ipaddress {
192.168.1.130
}
}

3. 配置keepalived的配置文件，在node2(备用节点)上配置操作如下:

# vi /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
     notification_email {
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 120
    advert_int 1
    nopreempt

authentication {
    auth_type PASS
    auth_pass 1111
    }

virtual_ipaddress {
192.168.1.130
}
}

4. node1,node2启动keepalived服务

# service keepalived start
# chkconfig keepalived on

5. node1查看VIP启动情况

[[email protected] ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.130/32 scope global eth0
    inet6 fe80::20c:29ff:fe4e:5371/64 scope link
       valid_lft forever preferred_lft forever

四、配置MHA

1. monitor创建MHA的工作目录，并且创建相关配置文件（在软件包解压后的目录里面有样例配置文件）。

# mkdir -p /etc/masterha
# mkdir -p /var/log/masterha/app1
# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/

修改app1.cnf配置文件，修改后的文件内容如下:

# cat /etc/masterha/app1.cnf

[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/usr/local/mysql/data/
master_ip_failover_script= /usr/local/bin/master_ip_failover
master_ip_online_change_script= /usr/local/bin/master_ip_online_change
password=123456
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=repl
report_script=/usr/local/bin/send_report
ssh_user=root

[server1]
hostname=192.168.1.121
port=3306

[server2]
hostname=192.168.1.122
port=3306
candidate_master=1
check_repl_delay=0

[server3]
hostname=192.168.1.123
port=3306

说明:
master_ip_failover_script= /usr/local/bin/master_ip_failover #MHA自动切换执行的脚本，需要修改
master_ip_online_change_script= /usr/local/bin/master_ip_online_change #手动切换需要执行的脚本，需要修改
report_script=/usr/local/bin/send_report #切换时发送邮件进行报告，需要修改

2. 设置relay log的清除方式（在每个slave节点上）：

(1)在node2,node3 从节点上操作：

将relay_log_purge=0加入my.cnf配置文件，前面已经配置。

(2) 设置定期清理relay脚本（node2,node3上操作）：

# cat purge_relay_log.sh
#!/bin/bash
user=root
passwd=123456
port=3306
log_dir=‘/data/masterha/log‘
work_dir=‘/data‘
purge=‘/usr/local/bin/purge_relay_logs‘

if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi

$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

配置定时计划任务

# crontab -e
0 4 * * * /bin/bash /root/purge_relay_log.sh

3. 要想把keepalived服务引入MHA，我们只需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。

(1) 编辑脚本/usr/local/bin/master_ip_failover，修改后如下:

# vi /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => ‘all‘;

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);

my $vip = ‘192.168.1.130‘;
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";

GetOptions(
    ‘command=s‘          => \$command,
    ‘ssh_user=s‘         => \$ssh_user,
    ‘orig_master_host=s‘ => \$orig_master_host,
    ‘orig_master_ip=s‘   => \$orig_master_ip,
    ‘orig_master_port=i‘ => \$orig_master_port,
    ‘new_master_host=s‘ => \$new_master_host,
    ‘new_master_ip=s‘    => \$new_master_ip,
    ‘new_master_port=i‘ => \$new_master_port,
);

exit &main();
sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ([email protected]) {
            warn "Got Error: [email protected]\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ([email protected]) {
            warn [email protected];
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

(2) 编辑脚本master_ip_online_change，修改后如下:

#!/usr/bin/env perl

# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => ‘all‘;

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
$command,              $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip,       $orig_master_port,         $orig_master_user,
$orig_master_password, $orig_master_ssh_user,     $new_master_host,
$new_master_ip,        $new_master_port,          $new_master_user,
$new_master_password, $new_master_ssh_user
);
my $vip = ‘192.168.1.130/24‘;
my $key = ‘1‘;
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $orig_master_ssh_port = 22;
my $new_master_ssh_port = 22;
GetOptions(
‘command=s‘                => \$command,
‘orig_master_is_new_slave‘ => \$orig_master_is_new_slave,
‘orig_master_host=s‘       => \$orig_master_host,
‘orig_master_ip=s‘         => \$orig_master_ip,
‘orig_master_port=i‘       => \$orig_master_port,
‘orig_master_user=s‘       => \$orig_master_user,
‘orig_master_password=s‘   => \$orig_master_password,
‘orig_master_ssh_user=s‘   => \$orig_master_ssh_user,
‘new_master_host=s‘        => \$new_master_host,
‘new_master_ip=s‘          => \$new_master_ip,
‘new_master_port=i‘        => \$new_master_port,
‘new_master_user=s‘        => \$new_master_user,
‘new_master_password=s‘    => \$new_master_password,
‘new_master_ssh_user=s‘    => \$new_master_ssh_user,
‘orig_master_ssh_port=i‘    => \$orig_master_ssh_port,
‘new_master_ssh_port=i‘    => \$new_master_ssh_port,
);

exit &main();

sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}

sub get_threads_util {
my $dbh                    = shift;
my $my_connection_id       = shift;
my $running_time_threshold = shift;
my $type                   = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type                   = 0 unless ($type);
my @threads;

my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();

while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

push @threads, $ref;
}
return @threads;
}

sub main {
if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

# Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      $orig_master_handler->disable_log_bin_local();
      print current_time_us() . " Drpping app user on the orig master..\n";
      #FIXME_xxx_drop_app_user($orig_master_handler);

## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      eval {
      `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
        };
        if ([email protected]) {
            warn [email protected];
        }
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: [email protected]\n";
      exit $exit_code;
    }
    exit $exit_code;
}
elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master‘s ip to the catalog database

# We don‘t return error even though activating updatable accounts/ip failed so that we don‘t interrupt slaves‘ recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

## Creating an app user on the new master
      print current_time_us() . " Creating app user on the new master..\n";
      #FIXME_xxx_create_app_user($new_master_handler);
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

## Update master ip on the catalog database, etc
      `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
      $exit_code = 0;
    };
    if ([email protected]) {
      warn "Got Error: [email protected]\n";
      exit $exit_code;
    }
    exit $exit_code;
}
elsif ( $command eq "status" ) {

# do nothing
    exit 0;
}
else {
    &usage();
    exit 1;
}
}

sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}

(3) 编辑脚本send_report，修改后如下:

#!/usr/bin/perl

use strict;
use warnings FATAL => ‘all‘;
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp=‘smtp.163.com‘;
my $mail_from=‘xxxx‘;
my $mail_user=‘xxxxx‘;
my $mail_pass=‘xxxxx‘;
my $mail_to=[‘xxxx‘,‘xxxx‘];
GetOptions(
‘orig_master_host=s‘ => \$dead_master_host,
‘new_master_host=s‘ => \$new_master_host,
‘new_slave_hosts=s‘ => \$new_slave_hosts,
‘subject=s‘ => \$subject,
‘body=s‘ => \$body,
);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can‘t open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => ‘text/plain; charset=utf-8‘,
        encoding    => ‘utf-8‘,
        smtp        => $smtp,
        from        => $mail_from,
        auth        => ‘LOGIN‘,
        TLS_allowed => ‘0‘,
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

$sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}

# Do whatever you want here

exit 0;

五、MHA的日常管理

1. 检查SSH配置(node5 Monitor 监控节点上操作)，如下：

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

Sun May 1 22:05:12 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun May 1 22:05:12 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun May 1 22:05:12 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun May 1 22:05:12 2016 - [info] Starting SSH connection tests..
Sun May 1 22:05:14 2016 - [debug]
Sun May 1 22:05:12 2016 - [debug] Connecting via SSH from [email protected](192.168.1.121:22) to [email protected](192.168.1.122:22)..
Sun May 1 22:05:13 2016 - [debug]   ok.
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from [email protected](192.168.1.121:22) to [email protected](192.168.1.123:22)..
Sun May 1 22:05:13 2016 - [debug]   ok.
Sun May 1 22:05:14 2016 - [debug]
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from [email protected](192.168.1.122:22) to [email protected](192.168.1.121:22)..
Sun May 1 22:05:13 2016 - [debug]   ok.
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from [email protected](192.168.1.122:22) to [email protected](192.168.1.123:22)..
Sun May 1 22:05:14 2016 - [debug]   ok.
Sun May 1 22:05:14 2016 - [debug]
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from [email protected](192.168.1.123:22) to [email protected](192.168.1.121:22)..
Sun May 1 22:05:14 2016 - [debug]   ok.
Sun May 1 22:05:14 2016 - [debug] Connecting via SSH from [email protected](192.168.1.123:22) to [email protected](192.168.1.122:22)..
Sun May 1 22:05:14 2016 - [debug]   ok.
Sun May 1 22:05:14 2016 - [info] All SSH connection tests passed successfully.

2. 检查整个复制环境状况(node5 监控节点上操作)，如下：

# masterha_check_repl --conf=/etc/masterha/app1.cnf

Sun May 1 22:46:44 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun May 1 22:46:44 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun May 1 22:46:44 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun May 1 22:46:44 2016 - [info] MHA::MasterMonitor version 0.56.
Sun May 1 22:46:45 2016 - [info] GTID failover mode = 0
Sun May 1 22:46:45 2016 - [info] Dead Servers:
Sun May 1 22:46:45 2016 - [info] Alive Servers:
Sun May 1 22:46:45 2016 - [info]   192.168.1.121(192.168.1.121:3306)
Sun May 1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)
Sun May 1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)
Sun May 1 22:46:45 2016 - [info] Alive Slaves:
Sun May 1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
Sun May 1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)
Sun May 1 22:46:45 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun May 1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
Sun May 1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)
Sun May 1 22:46:45 2016 - [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)
Sun May 1 22:46:45 2016 - [info] Checking slave configurations..
Sun May 1 22:46:45 2016 - [info] read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).
Sun May 1 22:46:45 2016 - [warning] relay_log_purge=0 is not set on slave 192.168.1.122(192.168.1.122:3306).
Sun May 1 22:46:45 2016 - [info] read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).
Sun May 1 22:46:45 2016 - [warning] relay_log_purge=0 is not set on slave 192.168.1.123(192.168.1.123:3306).
Sun May 1 22:46:45 2016 - [info] Checking replication filtering settings..
Sun May 1 22:46:45 2016 - [info] binlog_do_db= , binlog_ignore_db=
Sun May 1 22:46:45 2016 - [info] Replication filtering check ok.
Sun May 1 22:46:45 2016 - [info] GTID (with auto-pos) is not supported
Sun May 1 22:46:45 2016 - [info] Starting SSH connection tests..
Sun May 1 22:46:46 2016 - [info] All SSH connection tests passed successfully.
Sun May 1 22:46:46 2016 - [info] Checking MHA Node version..
Sun May 1 22:46:47 2016 - [info] Version check ok.
Sun May 1 22:46:47 2016 - [info] Checking SSH publickey authentication settings on the current master..
Sun May 1 22:46:47 2016 - [info] HealthCheck: SSH to 192.168.1.121 is reachable.
Sun May 1 22:46:47 2016 - [info] Master MHA Node version is 0.56.
Sun May 1 22:46:47 2016 - [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..
Sun May 1 22:46:47 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
Sun May 1 22:46:47 2016 - [info]   Connecting to [email protected](192.168.1.121:22)..
Creating /tmp if not exists..    ok.
Checking output directory is accessible or not..
   ok.
Binlog found at /usr/local/mysql/data/, up to master-bin.000008
Sun May 1 22:46:48 2016 - [info] Binlog setting check done.
Sun May 1 22:46:48 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun May 1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘ --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx
Sun May 1 22:46:48 2016 - [info]   Connecting to [email protected](192.168.1.122:22)..
Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to node2-relay-bin.000002
    Temporary relay log file is /usr/local/mysql/data/node2-relay-bin.000002
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun May 1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘ --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx
Sun May 1 22:46:48 2016 - [info]   Connecting to [email protected](192.168.1.123:22)..
Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to slave-relay-bin.000012
    Temporary relay log file is /usr/local/mysql/data/slave-relay-bin.000012
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun May 1 22:46:48 2016 - [info] Slaves settings check done.
Sun May 1 22:46:48 2016 - [info]
192.168.1.121(192.168.1.121:3306) (current master)
+--192.168.1.122(192.168.1.122:3306)
+--192.168.1.123(192.168.1.123:3306)

Sun May 1 22:46:48 2016 - [info] Checking replication health on 192.168.1.122..
Sun May 1 22:46:48 2016 - [info] ok.
Sun May 1 22:46:48 2016 - [info] Checking replication health on 192.168.1.123..
Sun May 1 22:46:48 2016 - [info] ok.
Sun May 1 22:46:48 2016 - [info] Checking master_ip_failover_script status:
Sun May 1 22:46:48 2016 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK
Sun May 1 22:46:48 2016 - [info] OK.
Sun May 1 22:46:48 2016 - [warning] shutdown_script is not defined.
Sun May 1 22:46:48 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
[[email protected] masterha]#

3. 开启MHA Manager监控(node5操作)如下：

# mkdir -p /var/log/masterha/app1/
# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

参数说明：
--remove_dead_master_conf     #该参数代表当发生主从切换后，老的主库的ip将会从配置文件中移除。
--manger_log                  #日志存放位置
--ignore_last_failover        #在缺省情况下，如果MHA检测到连续发生宕机，会生成app1.failover.complete文件，会造成MHA管理进程无法启动。

4. 查看MHA Manager监控是否正常：

# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121

5. 查看启动日志(node5操作)如下：

# tail -n20 /var/log/masterha/app1/manager.log

6. 关闭MHA Manage监控：

(1) 关闭

# masterha_stop --conf=/etc/masterha/app1.cnf

(2) 启动

# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

六、MHA Failover切换

1. 自动Failover切换

(1) 模拟master mysql关闭
(2) VIP将会切换到node2
(3) /etc/masterha/app1.cnf中将原主服务器配置文件清掉。
(4) masterha_manager监控进程会自动退出关闭，并在/var/log/masterha/app1下生成app1.failover.complete文件，manager.log会记录全过程，从服务器会自动从新的主服务器复制。
(5) 原主服务器mysqld启动的，需要清掉/var/log/masterha/app1下生成app1.failover.complete文件，添加node1配置文件到/etc/masterha/app1.cnf，通过manager.log中的记录的故障点，重新同步主服务器，成为从节点。

2. 手动Failover切换

(1) 先停MHA Manager进程。
masterha_stop --conf=/etc/masterha/app1.cnf
(2) 停掉master mysqld
(3) 手动切换，在Manager主机上操作如下：
# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover

通过观察日志可以观察切换全过程。
(4) 如上节方式恢复节点为从服务器。

3. 正常运行情况下切换(Master正在运行)

等补充。

4. 小结

通过对MMM，MHA的环境搭建测试，MHA由于采用复制架构，原理简单，在一些对数据要求比较高的环境，为了保证可靠性，最好与半同步结合使用。

时间： 2024-10-10 08:04:09

MySQL MHA高可用环境搭建

一、安装MHA基本环境

1. 安装MHA node

(1) 基本环境说明，本文参考互联网文章学习，搭建MHA与测试如下。

(2) 在node1,node2,node3,node5操作：

安装MHA node节点软件包：

2. 安装MHA Manager

3. 节点间配置SSH登录无密码验证（MHA主机之间使用key登录）

二、搭建主从复制环境

1. 主从复制环境配置过程

(1) mysql安装过程略,但是三节点要创建如下链接

node1 my.cnf

node2 my.cnf

node3 my.cnf

(2) 在node1 (Master)上备份一份完整的数据：

(3) 在node1 (Master)上创建复制用户:

(4) 查看主库备份时的binlog名称和位置，MASTER_LOG_FILE和MASTER_LOG_POS：

(5) 把备份复制到192.168.1.122和192.168.1.123

(6) 分别在两台服务器上导入备份，执行复制相关命令

2. 创建MHA管理用户,在master上创建。

三、配置Keepalived VIP

1. 在node1(Master)与node2（备选主节点）安装keepalived。

2. 配置keepalived的配置文件，在node1(master)上配置操作如下:

3. 配置keepalived的配置文件，在node2(备用节点)上配置操作如下:

4. node1,node2启动keepalived服务

5. node1查看VIP启动情况

四、配置MHA

1. monitor创建MHA的工作目录，并且创建相关配置文件（在软件包解压后的目录里面有样例配置文件）。

2. 设置relay log的清除方式（在每个slave节点上）：

(1)在node2,node3 从节点上操作：

(2) 设置定期清理relay脚本（node2,node3上操作）：

3. 要想把keepalived服务引入MHA，我们只需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。

(1) 编辑脚本/usr/local/bin/master_ip_failover，修改后如下:

(2) 编辑脚本master_ip_online_change，修改后如下:

(3) 编辑脚本send_report，修改后如下:

五、MHA的日常管理

1. 检查SSH配置(node5 Monitor 监控节点上操作)，如下：

2. 检查整个复制环境状况(node5 监控节点上操作)，如下：

3. 开启MHA Manager监控(node5操作)如下：

4. 查看MHA Manager监控是否正常：

5. 查看启动日志(node5操作)如下：

6. 关闭MHA Manage监控：

(1) 关闭

(2) 启动

六、MHA Failover切换

1. 自动Failover切换

2. 手动Failover切换

3. 正常运行情况下切换(Master正在运行)

4. 小结

MySQL MHA高可用环境搭建的相关文章