为什么
传统复制和GTIDs切换的缺点
当replication故障出现之后,最头疼是replication架构的调整
一旦master down了,就需要配置某一台slave作为master
slave上开启二进制日志文件,写操作配置成新的slave。
如果架构是MSS,新的relay提升为master,后面的slave都需要change master to host,binlog-file,postion。还得保证数据的一致性,所以所要花费的时间很长
?
GTIDs只需要change master to new_host,但是在每台机器都要执行
?
所以我们使用mysql自带的fail-over,并且提供它提供python API。后期可以整合到自动运维平台中去
?
下载与安装软件
MySQL Utilities https://dev.mysql.com/downloads/utilities/
##maintaining and administering MySQL servers
Connector/Python https://dev.mysql.com/downloads/connector/python/
## a standardized database driver for Python platforms and development
yum install mysql-connector-python-2.1.3-1.el6.x86_64.rpm mysql-utilities-1.5.6-1.el6.noarch.rpm -y
?
服务搭建:
环境准备
monitor ????server1 ????192.168.88.121????????##监控最好有独立的服务器
master ????server2 ????192.168.88.122
slave ????server3 ????192.168.88.123
slave ????server4 ????192.168.88.124
监控机需要连接到MS上,获取运行状态
授权: 基本的:super, replication,slave,reload,
有些时候,当多个程序运行mysql failover,##监控避免单点
create,insert,drop,select (--force否则failed,一旦出错就会停止复制)
?
grant create,insert,drop,select,super,replication slave,reload on *.* to [email protected] identified by ‘123‘ with grant option;
?
检测授权是否成功
show grants for ‘repm‘@‘192.168.88.121‘
在monitor测试mysql -urepmon -predhat -h 192.168.88.122
?
配置文件
##删除skip-slave-start
+++增加配置
#add fail-over
report-host=自己IP ????????????????##向监控端报告自己的IP
master-info-repository=table????????????##将主机信息保存在表中
relay-log-info-repository=table????????##将中继信息保存在表中
+++
将相应的replication配置的缓存文件保存到数据表中,一般的情况下,slave它的master的相关信息以及复制当前的信息保存在master.info和relay-log.info,用处:在重新启动mysqld,mysql将自动启动slave,而主机的信息和复制的信息就通过这两个文件中的信息来获取
?
如果想自动化监控复制和切换,故障出现,就得重新指定master和binlog,position,如果保存在文件中的话,可能监控端需要相关的权限操作文件,所以可以在mysql的表中,这样修改的,也可以实时生效
注意:如果把master和relay-info保存在mysql的表,mysql锁创建的表是Myisam表,但是官方建议使用Innodb存储引擎,5.6之后呢默认时innodb,避免Myisam的自动修复功能
?
修改完成重启mysqld。
注意下mysql数据库中slave_master_info slave_relay_log_info两张表
?
启动监控端:
mysqlfailover --master=repmon:[email protected] --discover-slaves-login=repmon:redhat
--master指定M,后接"用户名:密码@host"
--discover-slaves-login自动发现slave。后接连接slave的用户名和密码
--log=file.log????????##指定日志
--failover-mode????##auto(default,没有slave可选就退出),elect(在制定的slave选取),fail(用于监控,没有failover)
?
#####
GTID Executed Set
c09756b8-a7e7-11e5-9468-000c29df5442:1-24
?
WARNING: Errant transaction(s) found on slave(s).
Replication Health Status
+-----------------+-------+---------+--------+------------+---------+
| host | port | role | state | gtid_mode | health |
+-----------------+-------+---------+--------+------------+---------+
| 192.168.88.122 | 3306 | MASTER | UP | ON | OK |
| 192.168.88.123 | 3306 | SLAVE | UP | ON | OK |
| 192.168.88.124 | 3306 | SLAVE | UP | ON | OK |
+-----------------+-------+---------+--------+------------+---------+
#####
现在测试功能
停掉master,看slave是否接管master,并调整架构
/etc/init.d/mysqld stop
下面是monitor上的调整信息
Failed to reconnect to the master after 3 attemps.
?
Failover starting in ‘auto‘ mode...
# Candidate slave 192.168.88.123:3306 will become the new master.
# Checking slaves status (before failover).
# Preparing candidate for failover.
# Creating replication user if it does not exist.
# ERROR: ERROR: Cannot grant replication slave to replication user.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Disconnecting new master as slave.
# Starting slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Failover complete.
# Discovering slaves for master at 192.168.88.123:3306
######新的架构
b89f9be8-a8af-11e5-9980-000c29ccacd8:1-2 [...]
?
Transactions executed on the servers:
+-----------------+-------+---------+--------+------------+---------+
| host | port | role | state | gtid_mode | health |
+-----------------+-------+---------+--------+------------+---------+
| 192.168.88.123 | 3306 | MASTER | UP | ON | OK |
| 192.168.88.124 | 3306 | SLAVE | UP | ON | OK |
+-----------------+-------+---------+--------+------------+---------+
####
在新的master(server3)上测试插入之后数据时候同步
?
但是当原来的master(server2) 恢复会正常的时候,mysql failover是不能够自动发现,并调整为原来的架构的。
所以要把master添加到集群,只能手动的调整
> change master to
> master_host=‘192.168.88.123‘,
> master_user=‘rep‘,
> master_password=‘redhat‘,
> master_auto_position=1;
此时监控端又可以检测到server2了