Sentinel 进程是用于监控 redis 集群中 Master 主服务器工作的状态,在 Master 主服务器发生故障的时候,可以实现 Master 和 Slave 服务器的切换,保证系统的高可用,其已经被集成在 redis2.6+的版本中, Redis 的哨兵模式到了 2.8 版本之后就稳定了下来。一般在生产环境也建议使用 Redis 的 2.8 版本的以后版本。哨兵(Sentinel) 是一个分布式系统, 可以在一个架构中运行多个哨兵(sentinel) 进程,这些进程使用流言协议(gossip protocols)来接收关于 Master 主服务器是否下线的信息,并使用投票协议(Agreement Protocols)来决定是否执行自动故障迁移,以及选择哪个 Slave 作为新的 Master。每个哨兵(Sentinel)进程会向其它哨兵(Sentinel)、 Master、 Slave 定时发送消息,以确认对方是否”活”着,如果发现对方在指定配置时间(可配置的)内未得到回应,则暂时认为对方已掉线,也就是所谓的” 主观认为宕机” , 主观是每个成员都具有的独自的而且可能相同也可能不同的意识,英文名称: Subjec Down,简称 SDOWN。有主观宕机,肯定就有客观宕机。当“哨兵群”中的多数 Sentinel 进程在对 Master 主服务器做出 SDOWN 的判断,并且通过 SENTINEL is-master-down-by-addr 命令互相交流之后,得出的 Master Server 下线判断,这种方式就是“客观宕机”, 客观是不依赖于某种意识而已经实际存在的一切事物, 英文名称是: Objectively Down, 简称 ODOWN。通过一定的 vote 算法,从剩下的 slave 从服务器节点中,选一台提升为 Master 服务器节点,然后自动修改相关配置,并开启故障转移(failover)。
Sentinel 机制可以解决 master 和 slave 角色的切换问题。
环境
主机名 | 主机IP地址 |
---|---|
Master | 192.168.36.110 |
Slave-1 | 192.168.36.111 |
Slave-2 | 192.168.36.112 |
环境前确保开启Redis服务
[[email protected] ~]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:6379 *:*
[[email protected] ~]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:6379 *:*
[[email protected] ~]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:6379 *:*
手动配置Master
Redis服务器默认为master,指定master服务器后将其他slave服务器使用命令配置为master服务器的slave。因为哨兵的前提是已经手动实现了一个redis master-slave的运行环境。
Slave-1配置slave
[[email protected] ~]#vim /apps/redis/etc/redis.conf
....
281 slaveof 192.168.36.110 6379 # slaveof指向master
288 masterauth 123456
....
[[email protected] ~]#ps -ef | grep redis
root 7397 1 0 10:45 ? 00:00:01 redis-server 0.0.0.0:6379
root 7484 7349 0 10:54 pts/0 00:00:00 grep --color=auto redis
[[email protected] ~]#kill -9 7397 # 终止进程
[[email protected] ~]#redis-server /apps/redis/etc/redis.conf # 重新加载配置文件
Slave-2配置slave
[[email protected] ~]#vim /apps/redis/etc/redis.conf
....
281 slaveof 192.168.36.110 6379
288 masterauth 123456
....
[[email protected] ~]#ps -ef | grep redis
root 8017 1 0 10:44 ? 00:00:01 redis-server 0.0.0.0:6379
root 8173 7926 0 10:56 pts/0 00:00:00 grep --color=auto redis
[[email protected] ~]#kill 8017
[[email protected] ~]#redis-server /apps/redis/etc/redis.conf # 重新加载配置文件
状态查看
# Slave-1状态
[[email protected] ~]#redis-cli
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:slave # 已变为slave
master_host:192.168.36.110
master_port:6379
master_link_status:up # 开启了状态同步
master_last_io_seconds_ago:8
master_sync_in_progress:0
slave_repl_offset:84
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:99a1dcabb930a97bbdea90450b2f891778c83e37
master_replid2:0000000000000000000000000000000000000000 # 保存了上一次的master_replid的值,当发生故障转移后此值会记录当前的master的id
master_repl_offset:84
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:84
# Slave-2 状态
[[email protected] ~]#redis-cli
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:slave # 已变为slave
master_host:192.168.36.110
master_port:6379
master_link_status:up # 开启了状态同步
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:224
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:99a1dcabb930a97bbdea90450b2f891778c83e37
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:224
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:155
repl_backlog_histlen:70
# Master状态
[[email protected] ~]#redis-cli
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:2 # 2个slave,此时Slave-1、Slave-2已经加入进来
slave0:ip=192.168.36.111,port=6379,state=online,offset=336,lag=1
slave1:ip=192.168.36.112,port=6379,state=online,offset=336,lag=1
master_replid:99a1dcabb930a97bbdea90450b2f891778c83e37
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:336
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:336
# 此时两个slave同步master数据,可以查看但不能写数据
127.0.0.1:6379> KEYS *
1) "key3"
2) "key2"
3) "key1"
127.0.0.1:6379> SET key5 value5
(error) READONLY You can‘t write aga×××t a read only slave.
127.0.0.1:6379> GET key3
"value4"
三台服务器编辑sentinel配置文件
# 由于Redis为编译安装,所以需要cp拷贝sentinel配置文件
# 如果yum安装,则存在sentinel配置文件,无需拷贝
[[email protected] ~]#cp /root/redis-4.0.14/sentinel.conf /apps/redis/etc/
Master配置
[[email protected] ~]#vim /apps/redis/etc/sentinel.conf
[[email protected] ~]#grep "^[a-Z]" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
#pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/"
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 192.168.36.111 6379 2 # 法定人数限制(quorum),即有几个 slave 认为 master down 了就进行故障转移
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 10000 # (SDOWN)主观下线的时间,单位(毫秒)
sentinel parallel-syncs mymaster 1 # 发生故障转移时候同时向新 master 同步数据的 slave 数量, 数字越小总同步时间越长
sentinel failover-timeout mymaster 180000 # 所有 slaves 指向新的 master 所需的超时时间
sentinel deny-scripts-reconfig yes # 禁止修改脚本
# 将配置文件scp到两个slave节点
[[email protected] redis-4.0.14]#scp /apps/redis/sentinel.conf 192.168.36.111:/apps/redis/
[email protected]‘s password:
sentinel.conf 100% 282 214.2KB/s 00:00
[[email protected] redis-4.0.14]#scp /apps/redis/sentinel.conf 192.168.36.112:/apps/redis/
[email protected]‘s password:
sentinel.conf 100% 282 267.0KB/s 00:00
启动哨兵
[[email protected] ~]#redis-sentinel /apps/redis/etc/sentinel.conf
[[email protected] ~]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 511 *:26379 *:*
[[email protected] ~]#redis-sentinel /apps/redis/etc/sentinel.conf
[[email protected] ~]#redis-sentinel /apps/redis/etc/sentinel.conf
哨兵日志
[[email protected] ~]#tail -f /apps/redis/logs/sentinel_26379.log
14129:X 14 Jun 16:23:34.697 # Sentinel is now ready to exit, bye bye...
14134:X 14 Jun 16:23:40.985 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
14134:X 14 Jun 16:23:40.985 # Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=14134, just started
14134:X 14 Jun 16:23:40.985 # Configuration loaded
14134:X 14 Jun 16:23:40.986 * Increased maximum number of open files to 10032 (it was originally set to 1024).
14134:X 14 Jun 16:23:40.987 * Running mode=sentinel, port=26379.
14134:X 14 Jun 16:23:40.987 # Sentinel ID is 69d6647e2c6236b5b72d8e943b5d5707db47b9a4
14134:X 14 Jun 16:23:40.987 # +monitor master mymaster 192.168.36.110 6379 quorum 2
14134:X 14 Jun 16:23:43.015 * +sentinel sentinel abeb0c89a25c690b5cbe09491de6ab822deee15e 192.168.36.112 26379 @ mymaster 192.168.36.110 6379
14134:X 14 Jun 16:23:43.050 * +sentinel sentinel 4d3b7eb172aaef1a58b35c1a567534c67f3977ef 192.168.36.111 26379 @ mymaster 192.168.36.110 6379
状态查看
[[email protected] ~]#redis-cli -h 192.168.36.110 -p 26379 # 通过哨兵26379端口进行查看
192.168.36.110:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.36.110:6379,slaves=2,sentinels=3
停止master节点的redis服务时哨兵日志变化
[[email protected] ~]#tail -f /apps/redis/logs/sentinel_26379.log
14232:X 14 Jun 16:29:58.189 # +sdown master mymaster 192.168.36.110 6379
14232:X 14 Jun 16:29:58.218 # +new-epoch 1
14232:X 14 Jun 16:29:58.219 # +vote-for-leader 4d3b7eb172aaef1a58b35c1a567534c67f3977ef 1
14232:X 14 Jun 16:29:58.266 # +odown master mymaster 192.168.36.110 6379 #quorum 3/2
14232:X 14 Jun 16:29:58.266 # Next failover delay: I will not start a failover before Fri Jun 14 16:35:58 2019
14232:X 14 Jun 16:29:59.468 # +config-update-from sentinel 4d3b7eb172aaef1a58b35c1a567534c67f3977ef 192.168.36.111 26379 @ mymaster 192.168.36.110 6379
14232:X 14 Jun 16:29:59.468 # +switch-master mymaster 192.168.36.110 6379 192.168.36.111 6379
14232:X 14 Jun 16:29:59.469 * +slave slave 192.168.36.112:6379 192.168.36.112 6379 @ mymaster 192.168.36.111 6379
14232:X 14 Jun 16:29:59.469 * +slave slave 192.168.36.110:6379 192.168.36.110 6379 @ mymaster 192.168.36.111 6379
14232:X 14 Jun 16:30:29.507 # +sdown slave 192.168.36.110:6379 192.168.36.110 6379 @ mymaster 192.168.36.111 6379
查看哨兵信息
[[email protected] ~]#redis-cli -h 192.168.36.110 -p 26379
192.168.36.110:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.36.111:6379,slaves=2,sentinels=3
故障转移后的redis配置文件变化
# 故障转移后 redis.conf 中的 replicaof 行的 master IP 会被修改, sentinel.conf 中的 sentinel monitor IP 会被修改
[[email protected] ~]#cat /apps/redis/sentinel.conf
bind 0.0.0.0
port 26379
logfile "sentinel_26379.log"
dir "/apps/redis/logs"
sentinel myid 4d3b7eb172aaef1a58b35c1a567534c67f3977ef
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 192.168.36.111 6379 2
sentinel auth-pass mymaster 123456
sentinel config-epoch mymaster 1
# Generated by CONFIG REWRITE
sentinel leader-epoch mymaster 1
sentinel known-slave mymaster 192.168.36.110 6379
sentinel known-slave mymaster 192.168.36.112 6379
sentinel known-sentinel mymaster 192.168.36.110 26379 69d6647e2c6236b5b72d8e943b5d5707db47b9a4
sentinel known-sentinel mymaster 192.168.36.112 26379 abeb0c89a25c690b5cbe09491de6ab822deee15e
sentinel current-epoch 1
当前redis状态
[[email protected] ~]#redis-cli
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:master # Slave-1变为master节点
connected_slaves:1 #
slave0:ip=192.168.36.112,port=6379,state=online,offset=162954,lag=1
master_replid:e95e0241596bd1073ca558fc7cb892a7a6b4dbe6 # 故障转移后的当前master_replid
master_replid2:305f29a1bce5172f4c7e263de0d346fd33362d4d # 故障转移前的master_replid
master_repl_offset:163240
second_repl_offset:72111
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:163240
[[email protected] ~]#redis-cli
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:192.168.36.111 # 故障转移后新master IP地址
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:187718
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:e95e0241596bd1073ca558fc7cb892a7a6b4dbe6
master_replid2:305f29a1bce5172f4c7e263de0d346fd33362d4d
master_repl_offset:187718
second_repl_offset:72111
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:71
repl_backlog_histlen:187648
原文地址:https://blog.51cto.com/12980155/2409171