三台新机器安装redis步骤省略,和上面一致。
三台新机器的各节点配置和迁移前三台机器一直,只需要修改ip地址即可。路径和端口一致
启动三台新机器的redis节点服务
在新节点redis-new01上安装Ruby,安装过程省略,和上面一直。
将三个新节点都添加到之前的集群中。
=====================
先添加主节点
命令格式 "redis-trib.rb add-node <新增节点名> < 原集群节点名>"
第一个为新节点IP的master端口,第二个参数为现有的任意节点IP的master端口
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.202:7000 172.16.60.207:7000
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.204:7002 172.16.60.207:7000
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.205:7004 172.16.60.207:7000
=====================
再添加新机器的从节点
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.204:7003 172.16.60.202:7000
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.205:7005 172.16.60.204:7002
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.202:7001 172.16.60.205:7004
查看此时集群状态
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
查看集群的哈希槽slot情况
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000
172.16.60.202:7000 (a0169bec...) -> 0 keys | 0 slots | 1 slaves.
172.16.60.209:7004 (47cde5c7...) -> 3 keys | 5461 slots | 1 slaves.
172.16.60.208:7002 (656fc84a...) -> 1 keys | 5462 slots | 1 slaves.
172.16.60.205:7004 (48cbab90...) -> 0 keys | 0 slots | 1 slaves.
172.16.60.207:7000 (a8fe2d6e...) -> 2 keys | 5461 slots | 1 slaves.
172.16.60.204:7002 (c6a78cfb...) -> 0 keys | 0 slots | 1 slaves.
[OK] 6 keys in 6 masters.
0.00 keys per slot on average.
新添加的master节点的slot默认都是为0,master主节点如果没有slots的话,存取数据就都不会被选中!
数据只会存储在master主节点中!
所以需要给新添加的master主节点分配slot,即reshard slot操作。
如上根据最后一个新master节点添加成功后显示的slot可知,已有的master节点的slot分配情况为:
172.16.60.207:7000 --> slots:0-5460 (5461 slots) master
172.16.60.208:7002 --> slots:5461-10922 (5462 slots) master
172.16.60.209:7004 --> slots:10923-16383 (5461 slots) master
现在开始往新添加的三个master节点分配slot
a)将172.16.60.207:7000的slot全部分配(5461)给172.16.60.202:7000
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
........
How many slots do you want to move (from 1 to 16384)? 5461 #分配多少数量的slot。(这里要把172.16.60.207:7000节点的slot都分配完)
What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd #上面那些数量的slot被哪个节点接收。这里填写172.16.60.202:7000节点ID
Please enter all the source node IDs.
Type ‘all‘ to use all the nodes as source nodes for the hash slots.
Type ‘done‘ once you entered all the source nodes IDs.
Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee #指从哪个节点分配上面指定数量的slot。这里填写172.16.60.207:7000的ID。如果填写all,则表示从之前所有master节点中抽取上面指定数量的slot。
Source node #2:done #填写done
.......
Do you want to proceed with the proposed reshard plan ( yes /no )? yes #填写yes,确认分配
==================================================================
可能会遇到点问题,resharding执行中断。然后出现两边都有slot的情况。
Moving slot 4396 from 172.16.60.207:7000 to 172.16.60.202:7000:
Moving slot 4397 from 172.16.60.207:7000 to 172.16.60.202:7000:
Moving slot 4398 from 172.16.60.207:7000 to 172.16.60.202:7000:
Moving slot 4399 from 172.16.60.207:7000 to 172.16.60.202:7000:
Moving slot 4400 from 172.16.60.207:7000 to 172.16.60.202:7000:
Moving slot 4401 from 172.16.60.207:7000 to 172.16.60.202:7000:
[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
slots:0-4400 (4401 slots) master
1 additional replica(s)
.......
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
slots:4401-5460 (1060 slots) master
1 additional replica(s)
分析原因:
reshard重新分配slot时报错内容为:Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY)
但是迁移没有key-value的槽的时候就会执行成功。 这就说明问题出在了存不存在key-value上!
找到reshard的执行过程:发现具体迁移步骤是通过 move_slot函数调用(redis-trib.rb文件中)。
打开move_slot函数,找到具体的迁移代码。
[[email protected] redis-cluster] # cp /data/redis-4.0.6/src/redis-trib.rb /tmp/
[[email protected] redis-cluster] # cat /data/redis-4.0.6/src/redis-trib.rb|grep source.r.client.call
source .r.client.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:keys,*keys])
source .r.client.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:replace,:keys,*keys])
上面 grep 出来的 source .r.client.call部分则就是redis-trib.rb脚本告知客户端执行迁移带key-value槽的指令。
我们会发现该指令的具体调用时,等同于
"client migrate target.info[:host],target.info[:port]," ",0,@timeout,:replace,:keys,*keys]"
问题来了,这条指令在服务器中怎么执行的呢?
它先执行networking.c 文件中的 clientCommand(client *c)
根据参数一一比对( if 条件语句)。这时候就会发现bug来了!!!clientCommand函数中没有 migrate分支。
所以会返回一个 Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY);
这个错误信息告诉你, Client中只有LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY分支。
那么怎么去修改实现真正的带key迁移的slot呢?
研究源码,cluster.c文件中里面有migrateCommand(client *c)。恍然大悟,故只要将redis-trib.rb文件中迁移语句修改为:
source .r.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout, "replace" ,:keys,*keys])
source .r.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:replace,:keys,*keys])
即不执行clientCommand,直接执行migrateCommand。
也就是说,只需要将redis-trib.rb文件中原来的
source .r.client.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:keys,*keys])
source .r.client.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:replace,:keys,*keys])
改为
source .r.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout, "replace" ,:keys,*keys])
source .r.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:replace,:keys,*keys])
问题就解决了!
[[email protected] redis-cluster] # cat /data/redis-4.0.6/src/redis-trib.rb |grep source.r.call
source .r.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout, "replace" ,:keys,*keys])
source .r.call([ "migrate" ,target.info[:host],target.info[:port], "" ,0,@timeout,:replace,:keys,*keys])
这个bug是因为ruby的gem不同造成的,以后5.0版本会抛弃redis-trib.rb。直接使用redis-cli客户端实现集群管理!!
==================================================================
redis-trib.rb脚本文件修改后,继续将172.16.60.207:7000剩下的slot全部分配给172.16.60.202:7000
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
........
>>> Check for open slots...
[WARNING] Node 172.16.60.202:7000 has slots in importing state (4401).
[WARNING] Node 172.16.60.207:7000 has slots in migrating state (4401).
[WARNING] The following slots are open : 4401
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** Please fix your cluster problems before resharding
解决办法:
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
172.16.60.202:7000> cluster setslot 4401 stable
OK
172.16.60.202:7000>
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
172.16.60.207:7000> cluster setslot 4401 stable
OK
172.16.60.207:7000>
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb fix 172.16.60.202:7000
.......
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
......
How many slots do you want to move (from 1 to 16384)? 1060
What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd
Please enter all the source node IDs.
Type ‘all‘ to use all the nodes as source nodes for the hash slots.
Type ‘done‘ once you entered all the source nodes IDs.
Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee
Source node #2:done
.......
Do you want to proceed with the proposed reshard plan ( yes /no )? yes
然后再check检查集群状态.
发现172.16.60.207:7000节点的5461个slot已经移动到172.16.60.202:7000节点上了。
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
slots:0-5460 (5461 slots) master
2 additional replica(s)
........
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
slots: (0 slots) master
0 additional replica(s)
b)将172.16.60.208:7002的slot(5462)全部分配给172.16.60.204:7002
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.204:7002
.......
How many slots do you want to move (from 1 to 16384)? 5462
What is the receiving node ID? c6a78cfbb77804c4837963b5f589064b6111457a
Please enter all the source node IDs.
Type ‘all‘ to use all the nodes as source nodes for the hash slots.
Type ‘done‘ once you entered all the source nodes IDs.
Source node #1:0060012d749167d3f72833d916e53b3445b66c62
Source node #2:done
.......
Do you want to proceed with the proposed reshard plan ( yes /no )? yes
c)将172.16.60.209:7004的slot(5461)全部分配给172.16.60.205:7004
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.205:7004
.........
How many slots do you want to move (from 1 to 16384)? 5461
What is the receiving node ID? 48cbab906141dd26241ccdbc38bee406586a8d03
Please enter all the source node IDs.
Type ‘all‘ to use all the nodes as source nodes for the hash slots.
Type ‘done‘ once you entered all the source nodes IDs.
Source node #1:e936d5b4c95b6cae57f994e95805aef87ea4a7a5
Source node #2:done
.........
Do you want to proceed with the proposed reshard plan ( yes /no )? yes
待到三个新节点的master都分配完哈希槽slot之后,再次查看下集群状态
发现迁移之前的那三个master的slot都为0了,slot都对应迁移到新的节点的三个master上了
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
>>> Performing Cluster Check (using node 172.16.60.202:7000)
M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
slots:0-5460 (5461 slots) master
2 additional replica(s)
S: d9671ca6b4235931a2a215cc327a400ad4f9a399 172.16.60.205:7005
slots: (0 slots) slave
replicates c6a78cfbb77804c4837963b5f589064b6111457a
M: e936d5b4c95b6cae57f994e95805aef87ea4a7a5 172.16.60.209:7004
slots: (0 slots) master
0 additional replica(s)
S: 213bde6296c36b5f31b958c7730ff1629125a204 172.16.60.207:7001
slots: (0 slots) slave
replicates 48cbab906141dd26241ccdbc38bee406586a8d03
M: 0060012d749167d3f72833d916e53b3445b66c62 172.16.60.208:7002
slots: (0 slots) master
0 additional replica(s)
S: 52b8d27838244657d9b01a233578f24d287979fe 172.16.60.208:7003
slots: (0 slots) slave
replicates a0169becd97ccca732d905fd762b4d615674f7bd
M: 48cbab906141dd26241ccdbc38bee406586a8d03 172.16.60.205:7004
slots:10923-16383 (5461 slots) master
2 additional replica(s)
S: e7592314869c29375599d781721ad76675645c4c 172.16.60.209:7005
slots: (0 slots) slave
replicates c6a78cfbb77804c4837963b5f589064b6111457a
S: 2950f2cb6d960cd48e792f7c82d62d2cd07d20f9 172.16.60.204:7003
slots: (0 slots) slave
replicates a0169becd97ccca732d905fd762b4d615674f7bd
M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
slots: (0 slots) master
0 additional replica(s)
M: c6a78cfbb77804c4837963b5f589064b6111457a 172.16.60.204:7002
slots:5461-10922 (5462 slots) master
2 additional replica(s)
S: 6e663a1bcc3d241ed4d1a9667a0cc92fbe554740 172.16.60.202:7001
slots: (0 slots) slave
replicates 48cbab906141dd26241ccdbc38bee406586a8d03
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
查看集群slot情况
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000
172.16.60.202:7000 (a0169bec...) -> 2 keys | 5461 slots | 2 slaves.
172.16.60.209:7004 (47cde5c7...) -> 0 keys | 0 slots | 0 slaves.
172.16.60.208:7002 (656fc84a...) -> 0 keys | 0 slots | 0 slaves.
172.16.60.205:7004 (48cbab90...) -> 3 keys | 5461 slots | 2 slaves.
172.16.60.207:7000 (a8fe2d6e...) -> 0 keys | 0 slots | 0 slaves.
172.16.60.204:7002 (c6a78cfb...) -> 1 keys | 5462 slots | 2 slaves.
[OK] 6 keys in 6 masters.
0.00 keys per slot on average.
检查下数据,发现测试数据也已经迁移到新的master节点上了
[[email protected] redis-cluster] # /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
172.16.60.202:7000> get test1
"test-207"
172.16.60.202:7000> get test2
-> Redirected to slot [8899] located at 172.16.60.204:7002
"test-208"
172.16.60.204:7002> get test3
-> Redirected to slot [13026] located at 172.16.60.205:7004
"test-209"
172.16.60.205:7004> get test11
"test-207-207"
172.16.60.205:7004> get test22
-> Redirected to slot [4401] located at 172.16.60.202:7000
"test-208-208"
172.16.60.202:7000> get test33
-> Redirected to slot [12833] located at 172.16.60.205:7004
"test-209-209"
172.16.60.205:7004>
|