redis演练(7) redis Sentinel实现故障转移

书接上文<redis演练(6) redis主从模式搭建>.

<redis演练(6) redis主从模式搭建>中仅仅配置了redis主从环境。分别配置了2个主从结构。

分别是1.有向无环,2星型模型。配置起来非常简单。但是,遗留了一个尾巴,没有阐述。如果master宕掉了怎么办?redis如何实现fail-over故障转移?本文,就重点说一下这块。主要内容

  1. 手动实现fail-over效果
  2. sentinel实现自动fail-over效果

手动实现fail-over效果

#有向无环模型(参照redis演练(6) redis主从模式搭建内容)
[[email protected] redis]# ps -ef |grep redis
root      2495     1  2 20:06 ?        00:00:01 bin/redis-server *:6379
root      2503     1  1 20:06 ?        00:00:00 bin/redis-server *:6381
root      2508     1  1 20:06 ?        00:00:00 bin/redis-server *:6380

#Master(有一个从6380)
127.0.0.1:6379> info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=99,lag=1
master_repl_offset:99
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:98

#Slave1 连接主6379
127.0.0.1:6380> info Replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:197
slave_priority:100
slave_read_only:1
connected_slaves:1
slave0:ip=127.0.0.1,port=6381,state=online,offset=197,lag=0
master_repl_offset:197
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:196

#6380的从
127.0.0.1:6381> info Replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:573
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

####################################
模拟6379 宕机
#####################################
[[email protected] redis]# bin/redis-cli shutdown
[[email protected] redis]# bin/redis-cli -p 6379 shutdown
Could not connect to Redis at 127.0.0.1:6379: Connection refused
#观察,发现master_link_status:down,表示主一定宕掉了
127.0.0.1:6380> info  Replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1049
master_link_down_since_seconds:42
slave_priority:100
slave_read_only:1
connected_slaves:1
slave0:ip=127.0.0.1,port=6381,state=online,offset=1105,lag=0
master_repl_offset:1105
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:1104
#开始从主切换(6380 -->6379)
# 只需要简单执行下面两句命令,就将6380主切换为主
127.0.0.1:6380> slaveof no one
OK
127.0.0.1:6380> config set slave-read-only no
OK
127.0.0.1:6380> set title "sentinel"
OK
#连到从服务上,没有问题
127.0.0.1:6381> get title
"sentinel"

日志(6379)

2495:M 05 Sep 20:06:23.615 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2495:M 05 Sep 20:06:23.615 # Server started, Redis version 3.2.3
2495:M 05 Sep 20:06:23.617 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add ‘vm.overcommit_memory = 1‘ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1‘ for this to take effect.
2495:M 05 Sep 20:06:24.815 * DB loaded from append only file: 1.199 seconds
2495:M 05 Sep 20:06:24.816 * The server is now ready to accept connections on port 6379
2495:M 05 Sep 20:06:24.816 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:24.816 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:29.841 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:29.841 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:34.867 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:34.875 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:39.919 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:39.921 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:44.971 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:44.971 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:50.022 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:50.022 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:55.134 - DB 0: 20019 keys (0 volatile) in 32768 slots HT.
2495:M 05 Sep 20:06:55.134 - 0 clients connected (0 slaves), 3764336 bytes in use
2495:M 05 Sep 20:06:58.775 - Accepted 127.0.0.1:44408
2495:M 05 Sep 20:06:58.775 * Slave 127.0.0.1:6380 asks for synchronization
2495:M 05 Sep 20:06:58.775 * Full resync requested by slave 127.0.0.1:6380
2495:M 05 Sep 20:06:58.775 * Starting BGSAVE for SYNC with target: disk
2495:M 05 Sep 20:06:58.776 * Background saving started by pid 2511
2511:C 05 Sep 20:06:58.868 * DB saved on disk
2511:C 05 Sep 20:06:58.870 * RDB: 0 MB of memory used by copy-on-write
2495:M 05 Sep 20:06:58.916 * Background saving terminated with success
2495:M 05 Sep 20:06:58.920 * Synchronization with slave 127.0.0.1:6380 succeeded

....

2495:M 05 Sep 20:19:19.471 # User requested shutdown...
2495:M 05 Sep 20:19:19.471 * Calling fsync() on the AOF file.
2495:M 05 Sep 20:19:19.471 * Removing the pid file.
2495:M 05 Sep 20:19:19.472 # Redis is now ready to exit, bye bye...

日志(6380)

2508:S 05 Sep 20:06:58.714 # Server started, Redis version 3.2.3
2508:S 05 Sep 20:06:58.714 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add ‘vm.overcommit_memory = 1‘ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1‘ for this to take effect.
2508:S 05 Sep 20:06:58.775 * DB loaded from disk: 0.060 seconds
2508:S 05 Sep 20:06:58.775 * The server is now ready to accept connections on port 6380
2508:S 05 Sep 20:06:58.775 * Connecting to MASTER 127.0.0.1:6379
2508:S 05 Sep 20:06:58.775 * MASTER <-> SLAVE sync started
2508:S 05 Sep 20:06:58.775 * Non blocking connect for SYNC fired the event.
2508:S 05 Sep 20:06:58.775 * Master replied to PING, replication can continue...
2508:S 05 Sep 20:06:58.775 * Partial resynchronization not possible (no cached master)
2508:S 05 Sep 20:06:58.802 * Full resync from master: 8d0d86237c36a8d6ace4eed9b5f6e5871b40da29:1
2508:S 05 Sep 20:06:58.917 * MASTER <-> SLAVE sync: receiving 489615 bytes from master
2508:S 05 Sep 20:06:58.922 * MASTER <-> SLAVE sync: Flushing old data
2508:S 05 Sep 20:06:58.938 * MASTER <-> SLAVE sync: Loading DB in memory
2508:S 05 Sep 20:06:58.969 * MASTER <-> SLAVE sync: Finished with success
2508:S 05 Sep 20:06:59.788 * Slave 127.0.0.1:6381 asks for synchronization
2508:S 05 Sep 20:06:59.788 * Full resync requested by slave 127.0.0.1:6381
2508:S 05 Sep 20:06:59.788 * Starting BGSAVE for SYNC with target: disk
2508:S 05 Sep 20:06:59.788 * Background saving started by pid 2512
2512:C 05 Sep 20:06:59.832 * DB saved on disk
2512:C 05 Sep 20:06:59.832 * RDB: 0 MB of memory used by copy-on-write
2508:S 05 Sep 20:06:59.896 * Background saving terminated with success
2508:S 05 Sep 20:06:59.899 * Synchronization with slave 127.0.0.1:6381 succeeded
2508:S 05 Sep 20:10:46.786 * 10000 changes in 60 seconds. Saving...
2508:S 05 Sep 20:10:46.786 * Background saving started by pid 2595
2595:C 05 Sep 20:10:46.800 * DB saved on disk
2595:C 05 Sep 20:10:46.801 * RDB: 0 MB of memory used by copy-on-write
2508:S 05 Sep 20:10:46.887 * Background saving terminated with success
2508:S 05 Sep 20:19:19.472 # Connection with master lost.
2508:S 05 Sep 20:19:19.472 * Caching the disconnected master state.
2508:S 05 Sep 20:19:19.594 * Connecting to MASTER 127.0.0.1:6379
2508:S 05 Sep 20:19:19.595 * MASTER <-> SLAVE sync started
2508:S 05 Sep 20:19:19.595 # Error condition on socket for SYNC: Connection refused
2508:S 05 Sep 20:19:20.619 * Connecting to MASTER 127.0.0.1:6379
2508:S 05 Sep 20:19:20.619 * MASTER <-> SLAVE sync started

...

2508:S 05 Sep 20:20:49.783 # Error condition on socket for SYNC: Connection refused
2508:M 05 Sep 20:20:50.283 * Discarding previously cached master state.
2508:M 05 Sep 20:20:50.283 * MASTER MODE enabled (user request from ‘id=6 addr=127.0.0.1:54717 fd=8 name= age=696 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof‘)
2508:M 05 Sep 20:25:47.073 * 1 changes in 900 seconds. Saving...
2508:M 05 Sep 20:25:47.074 * Background saving started by pid 2722
2722:C 05 Sep 20:25:47.087 * DB saved on disk
2722:C 05 Sep 20:25:47.088 * RDB: 0 MB of memory used by copy-on-write
2508:M 05 Sep 20:25:47.176 * Background saving terminated with success
2508:M 05 Sep 20:40:48.064 * 1 changes in 900 seconds. Saving...
2508:M 05 Sep 20:40:48.064 * Background saving started by pid 2813
2813:C 05 Sep 20:40:48.075 * DB saved on disk
2813:C 05 Sep 20:40:48.075 * RDB: 0 MB of memory used by copy-on-write
2508:M 05 Sep 20:40:48.165 * Background saving terminated with success

6381日志

2503:S 05 Sep 20:06:54.667 * DB loaded from disk: 0.087 seconds
2503:S 05 Sep 20:06:54.667 * The server is now ready to accept connections on port 6381
2503:S 05 Sep 20:06:54.667 * Connecting to MASTER 127.0.0.1:6380
2503:S 05 Sep 20:06:54.667 * MASTER <-> SLAVE sync started
2503:S 05 Sep 20:06:54.667 # Error condition on socket for SYNC: Connection refused
2503:S 05 Sep 20:06:55.691 * Connecting to MASTER 127.0.0.1:6380
2503:S 05 Sep 20:06:55.692 * MASTER <-> SLAVE sync started
2503:S 05 Sep 20:06:55.692 # Error condition on socket for SYNC: Connection refused
2503:S 05 Sep 20:06:56.716 * Connecting to MASTER 127.0.0.1:6380
2503:S 05 Sep 20:06:56.717 * MASTER <-> SLAVE sync started
2503:S 05 Sep 20:06:56.717 # Error condition on socket for SYNC: Connection refused
2503:S 05 Sep 20:06:57.741 * Connecting to MASTER 127.0.0.1:6380
2503:S 05 Sep 20:06:57.742 * MASTER <-> SLAVE sync started
2503:S 05 Sep 20:06:57.742 # Error condition on socket for SYNC: Connection refused
2503:S 05 Sep 20:06:58.764 * Connecting to MASTER 127.0.0.1:6380
2503:S 05 Sep 20:06:58.764 * MASTER <-> SLAVE sync started
2503:S 05 Sep 20:06:58.764 * Non blocking connect for SYNC fired the event.
2503:S 05 Sep 20:06:58.775 * Master replied to PING, replication can continue...
2503:S 05 Sep 20:06:58.775 * Partial resynchronization not possible (no cached master)
2503:S 05 Sep 20:06:58.776 * Master does not support PSYNC or is in error state (reply: -ERR Can‘t SYNC while not connected with my master)
2503:S 05 Sep 20:06:58.776 * Retrying with SYNC...
2503:S 05 Sep 20:06:58.803 # MASTER aborted replication with an error: ERR Can‘t SYNC while not connected with my master
2503:S 05 Sep 20:06:59.786 * Connecting to MASTER 127.0.0.1:6380
2503:S 05 Sep 20:06:59.787 * MASTER <-> SLAVE sync started
2503:S 05 Sep 20:06:59.787 * Non blocking connect for SYNC fired the event.
2503:S 05 Sep 20:06:59.787 * Master replied to PING, replication can continue...
2503:S 05 Sep 20:06:59.787 * Partial resynchronization not possible (no cached master)
2503:S 05 Sep 20:06:59.788 * Full resync from master: e1bfca531c87795977333fca30c7a75eea64a1de:1
2503:S 05 Sep 20:06:59.897 * MASTER <-> SLAVE sync: receiving 489615 bytes from master
2503:S 05 Sep 20:06:59.900 * MASTER <-> SLAVE sync: Flushing old data
2503:S 05 Sep 20:06:59.917 * MASTER <-> SLAVE sync: Loading DB in memory
2503:S 05 Sep 20:06:59.969 * MASTER <-> SLAVE sync: Finished with success

2.sentinel实现fail-over自动切换

从源文件中复制sentinel.conf
cp /usr/local/src/redis-3.2.3/sentinel.conf  /usr/local/redis/
#修改确认如下参数
sentinel monitor mymaster 127.0.0.1 6379 1
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1

参照

http://www.redis.cn/topics/sentinel.html

 bin/redis-server redis.conf  
 bin/redis-server redis6380.conf  
 bin/redis-server redis6381.conf 
 bin/redis-server sentinel.conf  --sentinel
端口 标志
6379 Master
6380 Slave
6381 Slave

使用sentinel 监控(正常初始化状态,使用sentinel监控如下)

2.1  Master状态

127.0.0.1:26379> sentinel masters
1)  1) "name"
    2) "mymaster"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6379"
    7) "runid"
    8) "4d2b8e087e297f5d6347e1599a37c4998ad056d6"
    9) "flags"
   10) "master"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "410"
   19) "last-ping-reply"
   20) "410"
   21) "down-after-milliseconds"
   22) "5000"
   23) "info-refresh"
   24) "7817"
   25) "role-reported"
   26) "master"
   27) "role-reported-time"
   28) "58045"
   29) "config-epoch"
   30) "0"
   31) "num-slaves"
   32) "2"
   33) "num-other-sentinels"
   34) "0"
   35) "quorum"
   36) "1"
   37) "failover-timeout"
   38) "180000"
   39) "parallel-syncs"
   40) "1"

可以知道Master的端口,备节点等信息。

2.2 查看初始的Slave信息

127.0.0.1:26379> sentinel slaves mymaster
1)  1) "name"
    2) "127.0.0.1:6380"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6380"
    7) "runid"
    8) "c344769d6d1cfd814437034b39f04b17851dca66"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "693"
   19) "last-ping-reply"
   20) "693"
   21) "down-after-milliseconds"
   22) "5000"
   23) "info-refresh"
   24) "6445"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "96788"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "127.0.0.1"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "6058"
2)  1) "name"
    2) "127.0.0.1:6381"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6381"
    7) "runid"
    8) "9f8666ce6e7b30d01449f6fb10d8556030a96186"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "693"
   19) "last-ping-reply"
   20) "693"
   21) "down-after-milliseconds"
   22) "5000"
   23) "info-refresh"
   24) "6444"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "96788"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "127.0.0.1"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "6058"

此时,sentinel日志风平浪静

2847:X 07 Sep 21:01:27.567 # Sentinel ID is 1b9d1d720b11ecf5568c3dc0194305e86c47ed9a
2847:X 07 Sep 21:01:27.567 # +monitor master mymaster 127.0.0.1 6379 quorum 1

2.3 模拟Master6379宕机)

127.0.0.1:6379> debug sleep 100
OK

2.4 sentinel自动进行failover切换

观看sentinel日志(sentinel具体工作详情)

2847:X 07 Sep 21:01:27.567 # +monitor master mymaster 127.0.0.1 6379 quorum 1
2847:X 07 Sep 21:03:49.117 # +sdown master mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:49.117 # +odown master mymaster 127.0.0.1 6379 #quorum 1/1
2847:X 07 Sep 21:03:49.117 # +new-epoch 4
2847:X 07 Sep 21:03:49.117 # +try-failover master mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:49.128 # +vote-for-leader 1b9d1d720b11ecf5568c3dc0194305e86c47ed9a 4
2847:X 07 Sep 21:03:49.129 # +elected-leader master mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:49.129 # +failover-state-select-slave master mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:49.185 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:49.185 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:49.252 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:50.262 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:50.262 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:50.315 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:51.308 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:51.308 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:51.365 # +failover-end master mymaster 127.0.0.1 6379
2847:X 07 Sep 21:03:51.365 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
2847:X 07 Sep 21:03:51.365 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
2847:X 07 Sep 21:03:51.365 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
2847:X 07 Sep 21:03:56.399 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
2847:X 07 Sep 21:05:23.708 # -sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
2847:X 07 Sep 21:05:33.730 * +convert-to-slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

2.6  failover切换后监控信息

127.0.0.1:26379> sentinel masters
1)  1) "name"
    2) "mymaster"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6381" (这儿变成了6381)
    7) "runid"
    8) "9f8666ce6e7b30d01449f6fb10d8556030a96186"
    9) "flags"
   10) "master"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "190"
   19) "last-ping-reply"
   20) "190"
   21) "down-after-milliseconds"
   22) "5000"
   23) "info-refresh"
   24) "1905"
   25) "role-reported"
   26) "master"
   27) "role-reported-time"
   28) "42128"
   29) "config-epoch"
   30) "4"
   31) "num-slaves"
   32) "2"
   33) "num-other-sentinels"
   34) "0"
   35) "quorum"
   36) "1"
   37) "failover-timeout"
   38) "180000"
   39) "parallel-syncs"
   40) "1"

备库信息

127.0.0.1:26379> sentinel slaves mymaster
1)  1) "name"
    2) "127.0.0.1:6379"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"(此时6379 还处在sleep状态,过了休眠时间会更新该状态)
   11) "link-pending-commands"
   12) "44"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "48488"
   17) "last-ok-ping-reply"
   18) "48488"
   19) "last-ping-reply"
   20) "48488"
   21) "s-down-time"
   22) "43454"
   23) "down-after-milliseconds"
   24) "5000"
   25) "info-refresh"
   26) "1473253479853"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "48488"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
2)  1) "name"
    2) "127.0.0.1:6380"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6380"
    7) "runid"
    8) "c344769d6d1cfd814437034b39f04b17851dca66"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "308"
   19) "last-ping-reply"
   20) "308"
   21) "down-after-milliseconds"
   22) "5000"
   23) "info-refresh"
   24) "8265"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "48488"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "127.0.0.1"
   35) "master-port"
   36) "6381"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "11780"

127.0.0.1:26379> sentinel slaves mymaster
1)  1) "name"
    2) "127.0.0.1:6379"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6379"
    7) "runid"
    8) "4d2b8e087e297f5d6347e1599a37c4998ad056d6"
    9) "flags"
   10) "slave" s_down没有了
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "869"
   19) "last-ping-reply"
   20) "869"
   21) "down-after-milliseconds"
   22) "5000"
   23) "info-refresh"
   24) "3426"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "3426"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "127.0.0.1"
   35) "master-port"
   36) "6381"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "16556"
...

手动切换failover

127.0.0.1:26379> SENTINEL failover mymaster
OK
#切换为6379
127.0.0.1:26379> SENTINEL get-master-addr-by-name mymaster
1) "127.0.0.1"
2) "6379"

2847:X 07 Sep 21:46:46.793 # +switch-master mymaster 127.0.0.1 6381 127.0.0.1 6379
2847:X 07 Sep 21:46:46.794 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:46:46.794 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2847:X 07 Sep 21:46:56.910 * +convert-to-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

sentinel 发生failover,会更新对应主备库的redis.conf文件。

6379对应的配置文件,添加了slaveof参数

[[email protected] redis]# cat redis.conf | grep slaveof
# Master-Slave replication. Use slaveof to make a Redis instance a copy of
# slaveof <masterip> <masterport>
slaveof 127.0.0.1 6381

6380对应的配置文件,修改了slaveof参数

[[email protected] redis]# cat redis6380.conf | grep slaveof
# Master-Slave replication. Use slaveof to make a Redis instance a copy of
slaveof 127.0.0.1 6381

至此演练结束

3其他

3.1 重要的quorum参数。

演练设置quorum=1,纯粹为了简单,线上环境不能重要。

http://redis.io/topics/sentinel 中提供了讨论了4个场景,在以后慢慢演练讨论下。

复制sentinel.conf时, 需要处理sentinel生成的信息如

sentinel myid 575cb680ff3d3cbad55cdb978c1d6b5962abe7ac

否则,sentinel之间通信存在问题

时间: 2024-10-18 14:12:45

redis演练(7) redis Sentinel实现故障转移的相关文章

redis演练(9) redis Cluster 集群管理&failover情况

<redis演练(8) redis Cluster 集群环境安装>,简单阐述了如何安装redis集群环境. 集群环境,主要包括2部分. 1.配置每个节点的配置信息(redis.conf),尤其开启cluster 2.创建集群redis-trib.rb创建集群. 过程非常简单,但非常繁琐,尤其配置各个集群节点的配置信息,如果有一定数量,工作量也不小. 没关系,redis提供了一款cluster工具,能快速构造集群环境.本章的主要内容是介绍redis提供的集群工具. 1.使用create-clus

redis演练(10) redis Cluster 集群节点维护

通过<redis演练(9)>演练,借助自带的redis-trib.rb工具,可"秒出"一个6节点的主从集群:还可以阅读服务器的响应:还演练了下自动failover效果. 接上回继续演练.本文演练内容涵盖以下内容. 为6节点集群环境,添加新节点 删除新增的新节点 集群间迁移 1.添加新节点 #环境清理 [[email protected] create-cluster]# ./create-cluster clean [[email protected] create-clu

redis演练(3) redis事务管理

redis vs memcached. redis与memcached对比,redis不仅适合做缓存,而且可以做存储,这就有点数据库的影子了.说到数据库,事务是一个很重要的一个方面. 数据库事务 (简称:事务)是数据库管理系统执行过程中的一个逻辑单位,由一个有限的数据库操作序列构成. 一个数据库事务通常包含了一个序列的对数据库的读/写操作.它的存在包含有以下两个目的:1.为数据库操作序列提供了一个从失败中恢复到正常状态的方法,同时提供了数据库即使在异常状态下仍能保持一致性的方法.2.当多个应用程

redis演练(5) redis持久化

何谓持久化,就是媳妇让你,持久一些. 说白了持久化:就是将内存中的数据保存到磁盘上的过程(数据库也算磁盘的特殊表现),以保证宕机或断电后,可以继续访问.java中常见的持久化框架,如Hibernate,ibatis,jdbc都是持久化实现方式的一种,当然普通文件保存功能也算. 拿memcached来说,memcached保存的信息,没有进行持久化,所以只能存活一个进程生命期,下次重新启动,数据全部丢失.这算memcached的一个缺点吧.redis提供了持久化支持,而且提供了2种持久化方案. <

redis演练(6) redis复制(主备模式)

redis是一款面向分布式的Nosql产品,天生对主备模式有很好的支持,而且配置一套完整的主备模式,非常简单.针对redis,主备模式配置非常简单,但线上意义重大. 主要内容 1.CAP理论 2.简单redis的复制原理 3.redis replaction相关配置参数解析 4.配置星型模型主备模式 5.配置有向无欢模型主备模式 1.研磨redis的复制与集群概念 redis的复制与集群,刚开始我把两者闹了个误会,在不断深入学习过程中及时改正了. 简单区分一下. redis复制:可以理解为把re

redis演练(8) redis Cluster 集群环境安装

redis是个分布式缓存,与传统数据库最大的优势,在于它的"分布式"上. 分布式的优势: 容易实现容量的扩展 数据的均等分布 很好的高可用性 redis 和memcached是分布式缓存的两款流行方案,他们之间的对比 redis memcached 主从功能 Replication 支持 主备自动切换 本身不支持,可以通过客户端自己实现 键值一致性 哈希槽 一致性哈希 集群 服务端支持(但是beta版) unstable 由客户端实现 工具支持 提供自带的工具(客户端redis-cli

Redis的集群(故障转移)

Redis集群自身实现了高可用,当集群内少量节点出现故障时通过自动故障转移保证集群可以正常对外提供服务. 故障发现 1. 主观下线 当cluster-node-timeout时间内某节点无法与另一个节点顺利完成ping消息通信时,则将该节点标记为主观下线状态. 2. 客观下线 当某个节点判断另一个节点主观下线后,该节点的下线报告会通过Gossip消息传播.当接收节点发现消息体中含有主观下线的节点,其会尝试对该节点进行客观下线,依据下线报告是否在有效期内(如果在cluster-node-timeo

redis演练聚集

redis演练(1) 搭建redis服务 redis演练(2) 最全redis命令列表 redis演练(3) redis事务管理 redis演练(4) redis基准测试 redis演练(5) redis持久化 redis演练(6) redis主从模式搭建 redis运维命令及参数整理 redis演练(7) redis Sentinel实现故障转移 redis演练(8) redis Cluster 集群环境安装

redis主从+sentinel故障转移部署

redis的认识 redis是一个key-value存储系统.和Memcached类似,它支持存储的value类型相对更多,包括string(字符串).list(链表).set(集合)和zset(有序集合).这些数据类型都支持push/pop.add/remove及取交集并集和差集及更丰富的操作,而且这些操作都是原子性的.在此基础上,redis支持各种不同方式的排序.与memcached一样,为了保证效率,数据都是缓存在内存中.区别的是redis会周期性的把更新的数据写入磁盘或者把修改操作写入追