自Redis增加Sentinel集群工具以来,本博主就从未尝试过使用该工具。最近在调研目前主流的Redis集群部署方案,所以详细地看了一遍官方对于Sentinel的介绍并在自己的台式机上完成了三Redis实例+三Sentinel实例的部署,这里做一下简单的总结。
首先,下载安装Redis。目前随Redis 2.8发布的Sentinel版本被antirez称为Sentinel 2,是在Sentinel 1的基础上重写的。因为Sentinel 1已经废弃而且BUG太多,所以antirez强烈建议将Redis和Sentinel均升级到2.8版本,本博主安装的版本为最新的2.8.17。
其次,配置并启动Redis实例。分别在6379、6380和6381三个本地端口上启动三个Redis实例,其中6379为Master,其余两个为Slave。关于Redis的主从配置这里就不再赘述了,但其中需要指出的是两个Slave在配置参数slave-priority上的区别:6380实例该配置参数为50,6381实例该配置参数为100,这样当Master挂掉的时候Sentinel会优先选择slave-priority值较小的作为新的Master。
最后,配置并启动Sentinel实例。分别在26379、26380和26381三个本地端口上启动三个Sentinel实例,这三个Sentinel实例用来监控上面已经启动的三个Redis实例。以下是26379上Sentinel实例的配置文件内容,参考官方文档仅配置几个主要的参数,其他两个实例的配置文件只是端口号和数据目录不同。
port 26379
dir /home/liangzhichao/data/redis/sentinels/26379
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs master 1
sentinel failover-timeout mymaster 180000
在启动Sentinel实例时,因为想将日志信息打印到文件,但是没有找到在配置文件中设置日志文件的方法,所以直接采用以下方式启动:./redis-sentinel /home/liangzhichao/data/redis/confs/sentinel.26379.conf >> /home/liangzhichao/data/redis/logs/26379.log 2>&1 &。
完成Sentinel实例的启动后,我们不妨先看看Sentinel的日志,看看它启动之后究竟做了哪些工作,以下为26379实例的日志内容,从日志内容可以看到Sentinel启动之后至少做了四件事:1)为自己生成一个runid来唯一地标识本实例;2)开始监控运行在6379端口上的Master Redis实例;3)获取Master Redis实例的所有Slave Redis实例信息,以便在Master Redis实例挂掉之后可以从所有Salve Redis实例中选择出一个新的Master;4)向监控相同Master Redis实例的Sentinel实例发布自己的存在,以便让所有Sentinel实例认识并记住彼此。
[8229] 18 Nov 11:18:46.677 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
[8229] 18 Nov 11:18:46.677 # Redis can‘t set maximum open files to 10032 because of OS error: Operation not permitted.
[8229] 18 Nov 11:18:46.677 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase ‘ulimit -n‘.
[8229] 18 Nov 11:18:46.679 # Sentinel runid is 2262ed911e9414208af4b1c48ad2b449fd4e0b89
[8229] 18 Nov 11:18:46.679 # +monitor master mymaster 127.0.0.1 6379 quorum 2
[8229] 18 Nov 11:18:46.679 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8229] 18 Nov 11:18:46.679 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8229] 18 Nov 11:19:27.260 * +sentinel sentinel 127.0.0.1:26380 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
[8229] 18 Nov 11:19:36.069 * +sentinel sentinel 127.0.0.1:26381 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
这里需要注意的是,在我们观察到以上日志内容的同时,各个Sentinel实例也都更新了自己的配置文件,以记录目前最新的配置信息,此时每个Sentinel实例的配置文件内容与启动之前就大不相同了。以下列举了26379实例的配置文件的主要内容,从中可知所有Slave Redis实例的信息、其他Sentinel实例的信息都已经添加完成。另外,当前配置文件的版本号为0。
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel known-slave mymaster 127.0.0.1 6380
sentinel known-slave mymaster 127.0.0.1 6381
sentinel known-sentinel mymaster 127.0.0.1 26381 22b65a4796e6ece6b76284558a071cc83df71098
sentinel known-sentinel mymaster 127.0.0.1 26380 59616326f3c539ff3301098e1bf708350e6dd45d
sentinel current-epoch 0
至此,一主两从的Redis集群和三实例的Sentinel集群就全部启动完成并开始正常工作了。接下来我们不妨通过Jedis客户端来验证一下Sentinel集群的正确性,以下为测试代码,功能很简单:首先建立到Sentinel集群的连接,然后通过Sentinel集群获取当前Master Redis实例的信息,最后向Master Redis实例写入一条数据并查询该数据以确保数据写入成功。
package redis.clients.mytest; import java.util.HashSet; import java.util.Set; import redis.clients.jedis.HostAndPort; import redis.clients.jedis.Jedis; import redis.clients.jedis.JedisSentinelPool; public class MyJedisSentinelTest { public static void main(String[] args) { Set sentinels = new HashSet(); sentinels.add(new HostAndPort("localhost", 26379).toString()); sentinels.add(new HostAndPort("localhost", 26380).toString()); sentinels.add(new HostAndPort("localhost", 26381).toString()); JedisSentinelPool sentinelPool = new JedisSentinelPool("mymaster", sentinels); System.out.println("Current master: " + sentinelPool.getCurrentHostMaster().toString()); Jedis master = sentinelPool.getResource(); master.set("username","liangzhichao"); sentinelPool.returnResource(master); Jedis master2 = sentinelPool.getResource(); String value = master2.get("username"); System.out.println("username: " + value); master2.close(); sentinelPool.destroy(); } }
执行以上代码,我们会得到如下结果信息,由此可知通过Sentinel集群成功获取到了Master Redis实例的信息,到Master Redis实例的读写请求可以正常处理。
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Trying to find master from available Sentinels...
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Redis master running at 127.0.0.1:6379, starting Sentinel listeners...
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 127.0.0.1:6379
Current master: 127.0.0.1:6379
username: liangzhichao
目前为止,Sentinel集群都是正常工作的,接下来我们再来看一看Sentinel集群是如何处理Master Redis实例挂掉的。我们通过kill掉运行在6379端口上的Redis实例进程来触发这一情况,同时观察Sentinel集群各个实例的日志信息,以下为各个实例处理Master Redis实例挂掉的日志信息。
26379实例:
[8229] 19 Nov 14:41:32.033 # +sdown master mymaster 127.0.0.1 6379
[8229] 19 Nov 14:41:32.116 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
[8229] 19 Nov 14:41:32.116 # +new-epoch 1
[8229] 19 Nov 14:41:32.116 # +try-failover master mymaster 127.0.0.1 6379
[8229] 19 Nov 14:41:32.286 # +vote-for-leader 2262ed911e9414208af4b1c48ad2b449fd4e0b89 1
[8229] 19 Nov 14:41:32.286 # 127.0.0.1:26381 voted for 22b65a4796e6ece6b76284558a071cc83df71098 1
[8229] 19 Nov 14:41:32.387 # 127.0.0.1:26380 voted for 22b65a4796e6ece6b76284558a071cc83df71098 1
[8229] 19 Nov 14:41:33.326 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
[8229] 19 Nov 14:41:33.326 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
[8229] 19 Nov 14:41:33.326 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
[8229] 19 Nov 14:41:33.430 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
[8229] 19 Nov 14:42:03.507 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
26380实例:
[8243] 19 Nov 14:41:32.023 # +sdown master mymaster 127.0.0.1 6379
[8243] 19 Nov 14:41:32.336 # +new-epoch 1
[8243] 19 Nov 14:41:32.386 # +vote-for-leader 22b65a4796e6ece6b76284558a071cc83df71098 1
[8243] 19 Nov 14:41:33.151 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
[8243] 19 Nov 14:41:33.151 # Next failover delay: I will not start a failover before Wed Nov 19 14:47:32 2014
[8243] 19 Nov 14:41:33.327 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
[8243] 19 Nov 14:41:33.328 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
[8243] 19 Nov 14:41:33.328 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
[8243] 19 Nov 14:41:33.558 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
[8243] 19 Nov 14:42:03.616 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
26381实例:
[8247] 19 Nov 14:41:32.042 # +sdown master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.094 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
[8247] 19 Nov 14:41:32.094 # +new-epoch 1
[8247] 19 Nov 14:41:32.094 # +try-failover master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.194 # +vote-for-leader 22b65a4796e6ece6b76284558a071cc83df71098 1
[8247] 19 Nov 14:41:32.286 # 127.0.0.1:26379 voted for 2262ed911e9414208af4b1c48ad2b449fd4e0b89 1
[8247] 19 Nov 14:41:32.387 # 127.0.0.1:26380 voted for 22b65a4796e6ece6b76284558a071cc83df71098 1
[8247] 19 Nov 14:41:32.396 # +elected-leader master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.396 # +failover-state-select-slave master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.459 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.459 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.522 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.307 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.307 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.326 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.851 # -odown master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.356 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.356 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.426 # +failover-end master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.426 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
[8247] 19 Nov 14:41:34.427 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
[8247] 19 Nov 14:41:34.479 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
[8247] 19 Nov 14:42:04.531 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
由以上日志内容我们大致可以看到Sentinel集群处理Master Redis实例挂掉的基本流程:1)每个Sentinel实例通过监控发现6379端口的Master Redis实例不工作,于是将该实例的状态设置为sdown;2)通过Sentinel彼此之间通信确认大多数Sentinel实例均认为Master Redis挂掉,于是将该实例的状态设置为odown;3)准备触发Master Redis实例的failover,要选举一个Sentinel实例进行首次failover操作;4)选举出来的Sentinel实例从Slave Redis实例中选择一个出来成为新的Master Redis实例;5)完成Master Redis实例的切换之后,在各个Sentinel实例间同步最新的配置信息;6)让落选的Slave Redis实例切换到新的Master Redis实例,开始同步数据。
具体到我们的环境就是运行在端口26381上的Sentinel实例获得了执行此次failover的权限,于是它选择运行在端口6380上的Slave Redis实例成为新的Master Redis实例(因为6380实例的slave-priority比6381实例的值小),切换完成后落选的6381实例开始转而备份6380实例的数据。此时我们再看一看Sentinel实例的配置文件,以确认配置信息确实进行了更新。以下同样为26379实例的配置文件的主要内容,对比之前的配置文件内容我们可以知道Master Redis实例确实发生了切换,当前的配置信息版本已经变为1。
sentinel monitor mymaster 127.0.0.1 6380 2
sentinel known-slave mymaster 127.0.0.1 6381
sentinel known-slave mymaster 127.0.0.1 6379
sentinel known-sentinel mymaster 127.0.0.1 26381 22b65a4796e6ece6b76284558a071cc83df71098
sentinel known-sentinel mymaster 127.0.0.1 26380 59616326f3c539ff3301098e1bf708350e6dd45d
sentinel current-epoch 1
我们再执行一次上面的Jedis测试程序,得到以下结果,从Sentinel集群获取到的确实已经是新的Master Redis实例了!
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Trying to find master from available Sentinels...
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Redis master running at 127.0.0.1:6380, starting Sentinel listeners...
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 127.0.0.1:6380
Current master: 127.0.0.1:6380
username: liangzhichao