Redis的高可用(使用篇)

Redis的复制解决了单点问题,但主节点若出现故障,则要人工干预进行故障转移。先看看1主2从(master,slave-1和slave-2)的Redis主从模式下,如何进行故障转移的。

1. 主节点发生故障后,客户端连接主节点失败,两个从节点与主节点连接失败造成复制中断。

2. 需要选出一个从节点(slave-1),对其执行slaveof no one命令使其成为新的主节点(new-master)。

3. 从节点(slave-1)成为新的主节点后,更新应用方的主节点信息,重新启动应用方。

4. 客户端命令另一个从节点(slave-2)去复制新的主节点(new-master)。

5. 待原来的主节点恢复后,让它去复制新的主节点。

如上人工干预的过程,很难保证准确性,实效性,这正是Redis Sentinel要解决的问题。

Redis Sentinel是一个分布式架构,其中包含若干个Sentinel节点和Redis数据节点。每个Sentinel节点会对数据节点和其余Sentinel节点进行监控,当它发现节点不可达时,会对节点做下线标识。如果被标识的是主数据节点,它还会和其它Sentinel节点进行协商,当大多数Sentinel节点都认为主节点不可达时,它们会选举出一个Sentinel节点来完成自动故障转移的工作,同时会将这个变化实时通知给Redis应用方。整个过程不需人工介入,有效的解决了Redis的高可用问题。

部署Redis Sentinel的高可用架构

1. 搭建3个Redis数据节点,初始状态:master节点,6479端口;slave-1节点,6480端口和slave-2节点,6481端口。

127.0.0.1:6479> info replication

# Replication

role:master

connected_slaves:2

slave0:ip=127.0.0.1,port=6480,state=online,offset=845,lag=0

slave1:ip=127.0.0.1,port=6481,state=online,offset=845,lag=0

2. 搭建3个Sentinel节点,初始配置文件如下(3个节点分别对应26479,26480和26481端口):

port 26479

daemonize yes

loglevel notice

dir "/home/redis/stayfoolish/26479/data"

logfile "/home/redis/stayfoolish/26479/log/sentinel.log"

pidfile "/home/redis/stayfoolish/26479/log/sentinel.pid"

unixsocket "/home/redis/stayfoolish/26479/log/sentinel.sock"

# sfmaster

sentinel monitor sfmaster 127.0.0.1 6479 2

sentinel auth-pass sfmaster abcdefg

sentinel down-after-milliseconds sfmaster 30000

sentinel parallel-syncs sfmaster 1

sentinel failover-timeout sfmaster 180000

启动Sentinel节点,查看信息,可见其找到了主节点,发现了2个从节点,也发现了一共3个Sentinel节点。

127.0.0.1:26479> info sentinel

# Sentinel

sentinel_masters:1

sentinel_tilt:0

sentinel_running_scripts:0

sentinel_scripts_queue_length:0

sentinel_simulate_failure_flags:0

master0:name=sfmaster,status=ok,address=127.0.0.1:6479,slaves=2,sentinels=3

至此Redis Sentinel已经搭建起来了,有了Redis复制的基础,该过程还比较容易。

下面kill -9杀掉6479主节点,模拟故障,通过日志查看下故障转移的过程。

1. 杀掉6479主节点

$ ps -ef | egrep 'redis-server.*6479' | egrep -v 'egrep' | awk '{print $2}' | xargs kill -9

127.0.0.1:6479> info replication

Could not connect to Redis at 127.0.0.1:6479: Connection refused

not connected>

2. 看下Redis节点6480端口的日志,显示了无法连接6479端口,被Sentinel节点提升为新主节点,和响应6481端口复制请求的过程。

~/stayfoolish/6480/log $ tail -f redis.log

20047:S 22 Jul 03:03:22.946 # Error condition on socket for SYNC: Connection refused

20047:S 22 Jul 03:03:23.954 * Connecting to MASTER 127.0.0.1:6479

20047:S 22 Jul 03:03:23.955 * MASTER <-> SLAVE sync started

20047:S 22 Jul 03:03:23.955 # Error condition on socket for SYNC: Connection refused

...

20047:S 22 Jul 03:03:38.061 * MASTER <-> SLAVE sync started

20047:S 22 Jul 03:03:38.061 # Error condition on socket for SYNC: Connection refused

20047:M 22 Jul 03:03:38.963 * Discarding previously cached master state.

20047:M 22 Jul 03:03:38.963 * MASTER MODE enabled (user request from 'id=27 addr=127.0.0.1:37972 fd=10 name=sentinel-68102904-cmd age=882 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')

20047:M 22 Jul 03:03:38.963 # CONFIG REWRITE executed with success.

20047:M 22 Jul 03:03:40.075 * Slave 127.0.0.1:6481 asks for synchronization

20047:M 22 Jul 03:03:40.076 * Full resync requested by slave 127.0.0.1:6481

20047:M 22 Jul 03:03:40.077 * Starting BGSAVE for SYNC with target: disk

20047:M 22 Jul 03:03:40.077 * Background saving started by pid 20452

20452:C 22 Jul 03:03:40.086 * DB saved on disk

20452:C 22 Jul 03:03:40.086 * RDB: 0 MB of memory used by copy-on-write

20047:M 22 Jul 03:03:40.175 * Background saving terminated with success

20047:M 22 Jul 03:03:40.176 * Synchronization with slave 127.0.0.1:6481 succeeded

看下6481端口的日志,显示了无法连接6479端口,接到Sentinel节点的命令,复制新的主节点的过程。

~/stayfoolish/6481/log $ tail -f redis.log

20051:S 22 Jul 03:03:08.590 # Connection with master lost.

20051:S 22 Jul 03:03:08.590 * Caching the disconnected master state.

20051:S 22 Jul 03:03:08.844 * Connecting to MASTER 127.0.0.1:6479

20051:S 22 Jul 03:03:08.844 * MASTER <-> SLAVE sync started

20051:S 22 Jul 03:03:08.844 # Error condition on socket for SYNC: Connection refused

...

20051:S 22 Jul 03:03:39.067 # Error condition on socket for SYNC: Connection refused

20051:S 22 Jul 03:03:39.342 * Discarding previously cached master state.

20051:S 22 Jul 03:03:39.342 * SLAVE OF 127.0.0.1:6480 enabled (user request from 'id=27 addr=127.0.0.1:38660 fd=10 name=sentinel-68102904-cmd age=883 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=133 qbuf-free=32635 obl=36 oll=0 omem=0 events=r cmd=exec')

20051:S 22 Jul 03:03:39.343 # CONFIG REWRITE executed with success.

20051:S 22 Jul 03:03:40.074 * Connecting to MASTER 127.0.0.1:6480

20051:S 22 Jul 03:03:40.074 * MASTER <-> SLAVE sync started

20051:S 22 Jul 03:03:40.074 * Non blocking connect for SYNC fired the event.

20051:S 22 Jul 03:03:40.074 * Master replied to PING, replication can continue...

20051:S 22 Jul 03:03:40.075 * Partial resynchronization not possible (no cached master)

20051:S 22 Jul 03:03:40.084 * Full resync from master: 84b623afc0824be14bb9187245ff00cab43427c1:1

20051:S 22 Jul 03:03:40.176 * MASTER <-> SLAVE sync: receiving 77 bytes from master

20051:S 22 Jul 03:03:40.176 * MASTER <-> SLAVE sync: Flushing old data

20051:S 22 Jul 03:03:40.176 * MASTER <-> SLAVE sync: Loading DB in memory

20051:S 22 Jul 03:03:40.176 * MASTER <-> SLAVE sync: Finished with success

3. 看下Sentinel节点26479,26480和26481端口的日志,显示了Sentinel节点如何配合完成故障转移的(背后的原理下篇再说)。

~/stayfoolish/26479/log $ tail -f sentinel.log

20169:X 22 Jul 03:03:38.720 # +sdown master sfmaster 127.0.0.1 6479

20169:X 22 Jul 03:03:38.742 # +new-epoch 1

20169:X 22 Jul 03:03:38.743 # +vote-for-leader 68102904daa4df70bf945677f62498bbdffee1d4 1

20169:X 22 Jul 03:03:38.778 # +odown master sfmaster 127.0.0.1 6479 #quorum 3/2

20169:X 22 Jul 03:03:38.779 # Next failover delay: I will not start a failover before Sun Jul 22 03:09:39 2018

20169:X 22 Jul 03:03:39.346 # +config-update-from sentinel 68102904daa4df70bf945677f62498bbdffee1d4 127.0.0.1 26481 @ sfmaster 127.0.0.1 6479

20169:X 22 Jul 03:03:39.346 # +switch-master sfmaster 127.0.0.1 6479 127.0.0.1 6480

20169:X 22 Jul 03:03:39.346 * +slave slave 127.0.0.1:6481 127.0.0.1 6481 @ sfmaster 127.0.0.1 6480

20169:X 22 Jul 03:03:39.346 * +slave slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

20169:X 22 Jul 03:04:09.393 # +sdown slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

~/stayfoolish/26480/log $ tail -f sentinel.log

20171:X 22 Jul 03:03:38.665 # +sdown master sfmaster 127.0.0.1 6479

20171:X 22 Jul 03:03:38.741 # +new-epoch 1

20171:X 22 Jul 03:03:38.742 # +vote-for-leader 68102904daa4df70bf945677f62498bbdffee1d4 1

20171:X 22 Jul 03:03:39.343 # +config-update-from sentinel 68102904daa4df70bf945677f62498bbdffee1d4 127.0.0.1 26481 @ sfmaster 127.0.0.1 6479

20171:X 22 Jul 03:03:39.344 # +switch-master sfmaster 127.0.0.1 6479 127.0.0.1 6480

20171:X 22 Jul 03:03:39.344 * +slave slave 127.0.0.1:6481 127.0.0.1 6481 @ sfmaster 127.0.0.1 6480

20171:X 22 Jul 03:03:39.344 * +slave slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

20171:X 22 Jul 03:04:09.379 # +sdown slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

~/stayfoolish/26481/log $ tail -f sentinel.log

20177:X 22 Jul 03:03:38.671 # +sdown master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:38.730 # +odown master sfmaster 127.0.0.1 6479 #quorum 2/2

20177:X 22 Jul 03:03:38.730 # +new-epoch 1

20177:X 22 Jul 03:03:38.730 # +try-failover master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:38.731 # +vote-for-leader 68102904daa4df70bf945677f62498bbdffee1d4 1

20177:X 22 Jul 03:03:38.742 # 88fc1c8a5cdb41f3f92ed8e83e92e11b244b6e1a voted for 68102904daa4df70bf945677f62498bbdffee1d4 1

20177:X 22 Jul 03:03:38.744 # fc2182cf6c2cc8ae88dbe4bec35f1cdd9e9b8d65 voted for 68102904daa4df70bf945677f62498bbdffee1d4 1

20177:X 22 Jul 03:03:38.815 # +elected-leader master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:38.815 # +failover-state-select-slave master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:38.871 # +selected-slave slave 127.0.0.1:6480 127.0.0.1 6480 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:38.871 * +failover-state-send-slaveof-noone slave 127.0.0.1:6480 127.0.0.1 6480 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:38.962 * +failover-state-wait-promotion slave 127.0.0.1:6480 127.0.0.1 6480 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:39.269 # +promoted-slave slave 127.0.0.1:6480 127.0.0.1 6480 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:39.269 # +failover-state-reconf-slaves master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:39.342 * +slave-reconf-sent slave 127.0.0.1:6481 127.0.0.1 6481 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:39.859 # -odown master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:40.335 * +slave-reconf-inprog slave 127.0.0.1:6481 127.0.0.1 6481 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:40.335 * +slave-reconf-done slave 127.0.0.1:6481 127.0.0.1 6481 @ sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:40.410 # +failover-end master sfmaster 127.0.0.1 6479

20177:X 22 Jul 03:03:40.410 # +switch-master sfmaster 127.0.0.1 6479 127.0.0.1 6480

20177:X 22 Jul 03:03:40.411 * +slave slave 127.0.0.1:6481 127.0.0.1 6481 @ sfmaster 127.0.0.1 6480

20177:X 22 Jul 03:03:40.411 * +slave slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

20177:X 22 Jul 03:04:10.501 # +sdown slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

4. 启动6479端口,查看Sentinel节点26481端口的日志,显示了复制关系指向6480端口的结果。

~/stayfoolish/26481/log $ tail -f sentinel.log

20177:X 22 Jul 03:33:36.960 # -sdown slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

20177:X 22 Jul 03:33:46.959 * +convert-to-slave slave 127.0.0.1:6479 127.0.0.1 6479 @ sfmaster 127.0.0.1 6480

5. 查看下新的复制关系。

127.0.0.1:6480> info replication

# Replication

role:master

connected_slaves:2

slave0:ip=127.0.0.1,port=6481,state=online,offset=405522,lag=0

slave1:ip=127.0.0.1,port=6479,state=online,offset=405389,lag=0

127.0.0.1:26479> info sentinel

# Sentinel

sentinel_masters:1

...

master0:name=sfmaster,status=ok,address=127.0.0.1:6480,slaves=2,sentinels=3

熟悉Sentinel API

Sentinel节点是一个特殊的Redis节点,可以执行少数的命令,有自己专属的API,下面重点看几个。

1. sentinel get-master-addr-by-name <master name>返回指定<master name>主节点的IP和端口。

127.0.0.1:26479> sentinel get-master-addr-by-name sfmaster

1) "127.0.0.1"

2) "6480"

2. sentinel failover <master name>对指定的<master name>主节点进行强制故障转移(没有和其它Sentinel节点协商),当故障转移完成后,其它Sentinel节点按照故障转移的结果更新自身配置。

127.0.0.1:26479> sentinel failover sfmaster

OK

127.0.0.1:26479> info sentinel

# Sentinel

sentinel_masters:1

...

master0:name=sfmaster,status=ok,address=127.0.0.1:6481,slaves=2,sentinels=3

3. sentinel remove <master name>取消当前Sentinel节点对于指定<master name>主节点的监控,但该命令仅对当前Sentinel节点有效。

127.0.0.1:26479> sentinel remove sfmaster

OK

127.0.0.1:26479> info sentinel

# Sentinel

sentinel_masters:0

sentinel_tilt:0

sentinel_running_scripts:0

sentinel_scripts_queue_length:0

sentinel_simulate_failure_flags:0

4. sentinel monitor <master name> <ip> <port> <quorum>添加对主节点的监控。执行下面的命令完成对主节点6481端口的添加,通过info sentinel查看信息。

127.0.0.1:26479> sentinel monitor sfmaster 127.0.0.1 6481 2

OK

127.0.0.1:26479> info sentinel

# Sentinel

sentinel_masters:1

...

sentinel_simulate_failure_flags:0

master0:name=sfmaster,status=ok,address=127.0.0.1:6481,slaves=0,sentinels=3

发现slaves=0,应该slaves=2才对,为什么呢...

原来,sentinel remove移除主节点时,会将Redis节点的配置在该Sentinel节点上删除,其中包括了一条认证配置sentinel auth-pass sfmaster abcdefg,但sentinel monitor在添加刚移除的主节点时,并不会添加该条认证配置(小瑕疵)。手动添加,重启26479端口的Sentinel节点,可看到正常了。

127.0.0.1:26479> info sentinel

# Sentinel

sentinel_masters:1

...

master0:name=sfmaster,status=ok,address=127.0.0.1:6481,slaves=2,sentinels=3

若感兴趣可关注订阅号”数据库最佳实践”(DBBestPractice).

原文地址:http://blog.51cto.com/coveringindex/2148790

时间: 2024-10-01 07:21:57

Redis的高可用(使用篇)的相关文章

Redis + keepalived 高可用群集搭建

本次实验环境介绍: 操作系统: Centos 7.3 IP : 192.168.10.10 Centos 7.3 IP : 192.168.10.20  VIP    地址   : 192.168.10.254 软件版本介绍: redis : redis-3.2.8 下载链接: http://download.redis.io/releases/redis-3.2.8.tar.gz keepalived : keepalived-1.2.10  下载链接: http://www.keepaliv

Redis Sentinel 高可用实现说明

背景:      前面介绍了Redis 复制.Sentinel的搭建和原理说明,通过这篇文章大致能了解Sentinel的原理和实现方法以及相关的搭建.这篇文章就针对Redis Sentinel的搭建做下详细的说明. 安装:      这里对源码编译进行一下说明,本文实例的操作系统是Ubuntu16.04,使用Redis的版本是3.2.0.安装步骤如下: 下载源码包:wget http://download.redis.io/releases/redis-3.2.0.tar.gz 安装依赖包:su

Redis Sentinel安装与部署,实现redis的高可用

前言 对于生产环境,高可用是避免不了要面对的问题,无论什么环境.服务,只要用于生产,就需要满足高可用:此文针对的是redis的高可用. 接下来会有系列文章,该系列是对spring-session实现分布式集群session的共享的完整阐述,同时也引伸出缓存的实现:而此篇是该系列的第一篇. github地址:https://github.com/youzhibing/redis 环境准备 redis版本:redis-3.0.0 linux:centos6.7 ip:192.168.11.202,

部署redis主从高可用集群

部署redis主从高可用集群本文部署的redis集群是一主一从,这两台服务器都设置了哨兵进程,另外再加一台哨兵做仲裁,建议哨兵数量为基数172.16.1.187    redis主+哨兵172.16.1.188    redis从+哨兵172.16.1.189    哨兵以上系统均为CentOS6 在187,188,189上部署redis过程如下:(1)redis使用编译安装方式,所以需要安装编译基本组件# yum -y install gcc gcc-c++ make cmake cpp gl

高可用架构篇--MyCat在MySQL主从复制基础上实现读写分离

点击链接加入群[Dubbo技术交流2群]:https://jq.qq.com/?_wv=1027&k=46DcDFI 一.环境 操作系统:CentOS-6.6-x86_64-bin-DVD1.iso JDK版本:jdk1.7.0_45 MyCat版本:Mycat-server-1.4-release-20151019230038-linux.tar.gz MyCat节点IP:192.168.1.203      主机名:edu-mycat-01  主机配置:4核CPU.4G内存 MySQL版本:

Redis Sentinel高可用架构

Redis目前高可用的架构非常多,比如keepalived+redis,redis cluster,twemproxy,codis,这些架构各有优劣,今天暂且不说这些架构,今天主要说说redis sentinel高可用架构. 它的主要功能有以下几点 不时地监控redis是否按照预期良好地运行; 如果发现某个redis节点运行出现状况,能够通知另外一个进程(例如它的客户端); 能够进行自动切换.当一个master节点不可用时,能够选举出master的多个slave(如果有超过一个slave的话)中

分布式架构高可用架构篇_07_MySQL主从复制的配置(CentOS-6.7+MySQL-5.6)

环境 操作系统:CentOS-6.6-x86_64-bin-DVD1.iso MySQL 版本:mysql-5.6.22.tar.gz 主节点 IP:192.168.1.205 主机名:edu-mysql-01 从节点 IP:192.168.1.206 主机名:edu-mysql-02 MySQL 主从复制官方文档 http://dev.mysql.com/doc/refman/5.6/en/replication.html MySQL 主从复制的方式 MySQL5.6 开始主从复制有两种方式:

基于keepalived对redis做高可用配置---转载

关于keepalived的详细介绍,请移步本人相关博客:http://wangfeng7399.blog.51cto.com/3518031/1405785 功能 ip地址 安装软件 主redis 10.13.6.13 redis,keepalived 从redis 10.13.6.16 redis,keepalived VIP 10.13.6.17   一.redis主从搭建 1.redis安装 本文通过yum源的安装方式来安装(需要配置epel源),也可以通过源码编译的方式来安装 1 2 [

Redis的高可用详解:Redis哨兵、复制、集群的设计原理,以及区别

谈到Redis服务器的高可用,如何保证备份的机器是原始服务器的完整备份呢?这时候就需要哨兵和复制. 哨兵(Sentinel):可以管理多个Redis服务器,它提供了监控,提醒以及自动的故障转移的功能. 复制(Replication):则是负责让一个Redis服务器可以配备多个备份的服务器. Redis正是利用这两个功能来保证Redis的高可用. 哨兵(sentinal) 哨兵是Redis集群架构中非常重要的一个组件,哨兵的出现主要是解决了主从复制出现故障时需要人为干预的问题. 1.Redis哨兵