bug场景说明:
ruby-2.4.6和redis-4.1.3.gem环境下的redis-cluster集群中,在移除redis-cluster集群中含有数据的节点时出现的bug。下面详细说明下bug出现的场景。
redis-cluster移除节点:
和节点添加一样,移除节点也有移除主节点,从节点。
1、移除主节点
移除节点使用redis-trib的del-node命令,
redis-trib del-node 127.0.0.1:7002 ${node-id}
127.0.0.1:7002是redis-cluster 集群中的任意节点,node-id为要删除的主节点。 和添加节点不同,移除节点node-id是必需的,测试删除7002主节点:
实例:删除redis-cluster集群节点f7a95238e3de39b616b93949ec7c9f86d3867d63 对应的实例:192.168.1.39:1986
这个节点正好有数据
[[email protected] log]# /data/soft/redis-4.0.12/src/redis-trib.rb del-node 192.168.1.39:1986 f7a95238e3de39b616b93949ec7c9f86d3867d63
>>> Removing node f7a95238e3de39b616b93949ec7c9f86d3867d63 from cluster 192.168.1.39:1986
[ERR] Node 192.168.1.39:1986 is not empty! Reshard data away and try again.
[[email protected] log]#
删除节点失败,提示节点中存在数据,不能从redis-cluster中删除,需要将他的数据转移出去,也就是和新增主节点一样需重新分片.
需要重新分片:(移除192.168.1.39:1986的4096 个hash slots 到be2a864214a624789748c7f753377638c6f88751 192.168.1.54:1986 这master节点)
[[email protected] log]# /data/soft/redis-4.0.12/src/redis-trib.rb reshard 192.168.1.39:1986
>>> Performing Cluster Check (using node 192.168.1.39:1986)
M: f7a95238e3de39b616b93949ec7c9f86d3867d63 192.168.1.39:1986
slots:6827-10922 (4096 slots) master
1 additional replica(s)
M: be2a864214a624789748c7f753377638c6f88751 192.168.1.54:1986
slots:0-1364,5461-6826,10923-12287 (4096 slots) master
2 additional replica(s)
M: 8db27e432c9ef45fd37bba20e8ca7b71556541a0 192.168.1.39:986
slots:1365-5460 (4096 slots) master
1 additional replica(s)
M: 0b3963a0be38ea500013ddcaa5a5524801421dd5 192.168.1.182:1986
slots:12288-16383 (4096 slots) master
1 additional replica(s)
S: 907fedbbf999084893a9dff000701f6a9a92381a 192.168.1.105:986
slots: (0 slots) slave
replicates 0b3963a0be38ea500013ddcaa5a5524801421dd5
S: ae6e9b0139735cb5dbc4ebf3ad6e01b3f5420db5 192.168.1.54:1986
slots: (0 slots) slave
replicates be2a864214a624789748c7f753377638c6f88751
S: 0efd850873fb346fc3c273a9d3aeaac0a9e4d4a8 192.168.1.182:986
slots: (0 slots) slave
replicates f7a95238e3de39b616b93949ec7c9f86d3867d63
S: 7a778a0fb0acaa000d003008f50af430497218f3 192.168.1.105:1986
slots: (0 slots) slave
replicates 8db27e432c9ef45fd37bba20e8ca7b71556541a0
S: 47fe8159fd5be2e74e823e077d9afe0fd77570c4 192.168.1.54:986
slots: (0 slots) slave
replicates be2a864214a624789748c7f753377638c6f88751
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? be2a864214a624789748c7f753377638c6f88751
需要移动到全部主节点上还是单个主节点:
将4096个槽点移动到192.168.1.54:1986上,填写192.168.1.39:1986的node id :f7a95238e3de39b616b93949ec7c9f86d3867d63
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? be2a864214a624789748c7f753377638c6f88751
Please enter all the source node IDs.
Type ‘all‘ to use all the nodes as source nodes for the hash slots.
Type ‘done‘ once you entered all the source nodes IDs.
Source node #1:f7a95238e3de39b616b93949ec7c9f86d3867d63 ###192.168.1.39:1986的node id
Source node #2:done
...................
...................
Moving slot 10920 from f7a95238e3de39b616b93949ec7c9f86d3867d63
Moving slot 10921 from f7a95238e3de39b616b93949ec7c9f86d3867d63
Moving slot 10922 from f7a95238e3de39b616b93949ec7c9f86d3867d63
Do you want to proceed with the proposed reshard plan (yes/no)?yes
确认之后会一个一个将192.168.1.39:1986的卡槽移到到192.168.1.54:1986上
[[email protected] ~]# /data/soft/redis-4.0.12/src/redis-trib.rb reshard 192.168.1.182:1986
>>> Performing Cluster Check (using node 192.168.1.182:1986)
M: 0b3963a0be38e
0013ddcaa5a5524801421dd5 192.168.1.182:1986
slots:12288-16383 (4096 slots) master
1 additional replica(s)
S: 0efd850873fb346fc3c273a9d3aeaac0a9e4d4a8 192.168.1.182:986
slots: (0 slots) slave
replicates f7a95238e3de39b616b93949ec7c9f86d3867d63
S: 7a778a0fb0acaa000d003008f50af430497218f3 192.168.1.105:1986
slots: (0 slots) slave
replicates 8db27e432c9ef45fd37bba20e8ca7b71556541a0
S: ae6e9b0139735cb5dbc4ebf3ad6e01b3f5420db5 192.168.1.54:1986
slots: (0 slots) slave
replicates be2a864214a624789748c7f753377638c6f88751
M: f7a95238e3de39b616b93949ec7c9f86d3867d63 192.168.1.39:1986
slots:8530-10922 (2393 slots) master
1 additional replica(s)
S: 47fe8159fd5be2e74e823e077d9afe0fd77570c4 192.168.1.54:986
slots: (0 slots) slave
replicates be2a864214a624789748c7f753377638c6f88751
S: 907fedbbf999084893a9dff000701f6a9a92381a 192.168.1.105:986
slots: (0 slots) slave
replicates 0b3963a0be38ea500013ddcaa5a5524801421dd5
M: 8db27e432c9ef45fd37bba20e8ca7b71556541a0 192.168.1.39:986
slots:1365-5460 (4096 slots) master
1 additional replica(s)
M: be2a864214a624789748c7f753377638c6f88751 192.168.1.54:1986
slots:0-1364,5461-8529,10923-12287 (5799 slots) master
2 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
[WARNING] Node 192.168.1.39:1986 has slots in migrating state (8530).
[WARNING] Node 192.168.1.54:1986 has slots in importing state (8530).
[WARNING] The following slots are open: 8530
>>> Check slots coverage...
[OK] All 16384 slots covered.
如果在迁移过程遇到下面这样的错误:
[WARNING] Node 192.168.1.39:1986 has slots in migrating state (8530).
[WARNING] Node 192.168.1.54:1986 has slots in importing state (8530).
[WARNING] The following slots are open: 8530
[[email protected] ~]# /usr/local/redis/bin/redis-cli -h 192.168.1.39 -p 1986
192.168.1.39:1986> cluster setslot 8530 stable
OK
[[email protected] ~]# /usr/local/redis/bin/redis-cli -h 192.168.1.54 -p 1986
192.168.1.54:1986> cluster setslot 8530 stable
OK
可以考虑使用命令“redis-trib.rb fix 192.168.0.3:1986”尝试修复。需要显示有节点处于migrating或importing状态,可以登录到相应的节点,使用命令“cluster setslot (8530 stable”修改,参数8530为问题显示的slot的ID。
再次执行下面的命令:
[[email protected] ~]# /data/soft/redis-4.0.12/src/redis-trib.rb reshard 192.168.1.39:1986
>>> Performing Cluster Check (using node 192.168.1.39:1986)
M: f7a95238e3de39b616b93949ec7c9f86d3867d63 192.168.1.39:1986
slots:8530-10922 (2393 slots) master
1 additional replica(s)
M: be2a864214a624789748c7f753377638c6f88751 192.168.1.54:1986
slots:0-1364,5461-8529,10923-12287 (5799 slots) master
2 additional replica(s)
M: 8db27e432c9ef45fd37bba20e8ca7b71556541a0 192.168.1.39:986
slots:1365-5460 (4096 slots) master
1 additional replica(s)
M: 0b3963a0be38ea100013ddcaa5a5524801421dd5 192.168.1.182:1986
slots:12288-16383 (4096 slots) master
1 additional replica(s)
S: 907fedbbf999084893a9dff000701f6a9a92381a 192.168.1.105:986
slots: (0 slots) slave
replicates 0b3963a0be38ea100013ddcaa5a5524801421dd5
S: ae6e9b0139735cb5dbc4ebf3ad6e01b3f5420db5 192.168.1.54:1986
slots: (0 slots) slave
replicates be2a864214a624789748c7f753377638c6f88751
S: 0efd850873fb346fc3c273a9d3aeaac0a9e4d4a8 192.168.1.182:986
slots: (0 slots) slave
replicates f7a95238e3de39b616b93949ec7c9f86d3867d63
S: 7a778a0fb0acaa000d003008f50af430497218f3 192.168.1.105:1986
slots: (0 slots) slave
replicates 8db27e432c9ef45fd37bba20e8ca7b71556541a0
S: 47fe8159fd5be2e74e823e077d9afe0fd77570c4 192.168.1.54:986
slots: (0 slots) slave
replicates be2a864214a624789748c7f753377638c6f88751
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? be2a864214a624789748c7f753377638c6f88751
Please enter all the source node IDs.
Type ‘all‘ to use all the nodes as source nodes for the hash slots.
Type ‘done‘ once you entered all the source nodes IDs.
Source node #1:f7a95238e3de39b616b93949ec7c9f86d3867d63
Source node #2:done
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 8530 from 192.168.1.39:1986 to 192.168.1.54:1986:
[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
报错:
[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
参考网上资料:
https://blog.csdn.net/m0_37128231/article/details/80755478
说这个是redis-cluster集群管理命令:redis-trib.rb的一个bug
按照网上说的方法,但是结果还是在报错,没得到解决。
于是重新安装ruby和redis-gem版本:
[[email protected] redis]# rvm install 2.6.3
Searching for binary rubies, this might take some time.
No binary rubies available for: centos/7/x86_64/ruby-2.6.3.
Continuing with compilation. Please read ‘rvm help mount‘ to get more information on binary rubies.
Checking requirements for centos.
Requirements installation successful.
Installing Ruby from source to: /usr/local/rvm/rubies/ruby-2.6.3, this may take a while depending on your cpu(s)...
ruby-2.6.3 - #downloading ruby-2.6.3, this may take a while depending on your connection...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 13.8M 100 13.8M 0 0 27236 0 0:08:52 0:08:52 --:--:-- 32881
ruby-2.6.3 - #extracting ruby-2.6.3 to /usr/local/rvm/src/ruby-2.6.3.....
ruby-2.6.3 - #configuring......................................................................
ruby-2.6.3 - #post-configuration..
ruby-2.6.3 - #compiling...............................................................................................
ruby-2.6.3 - #installing................................
ruby-2.6.3 - #making binaries executable..
ruby-2.6.3 - #downloading rubygems-3.0.6
ruby-2.6.3 - #extracting rubygems-3.0.6......
ruby-2.6.3 - #removing old rubygems........
ruby-2.6.3 - #installing rubygems-3.0.6...............................................
ruby-2.6.3 - #gemset created /usr/local/rvm/gems/[email protected]
ruby-2.6.3 - #importing gemset /usr/local/rvm/gemsets/global.gems................................................................
ruby-2.6.3 - #generating global wrappers.......
ruby-2.6.3 - #gemset created /usr/local/rvm/gems/ruby-2.6.3
ruby-2.6.3 - #importing gemsetfile /usr/local/rvm/gemsets/default.gems evaluated to empty gem list
ruby-2.6.3 - #generating default wrappers.......
ruby-2.6.3 - #adjusting #shebangs for (gem irb erb ri rdoc testrb rake).
Install of ruby-2.6.3 - #complete
Ruby was built without documentation, to build it run: rvm docs generate-ri
[[email protected] redis]# ruby --version
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
[[email protected] redis]# gem install redis --version 3.2.1
Fetching redis-3.2.1.gem
Successfully installed redis-3.2.1
Parsing documentation for redis-3.2.1
Installing ri documentation for redis-3.2.1
Done installing documentation for redis after 0 seconds
1 gem installed
然后再测试删除节点:
删除节点注意:
redis-trib.rb del-node host:port node_id
在删除节点之前,其对应的槽必须为空,所以,在进行节点删除动作之前,必须使用redis-trib.rb reshard将其迁移出去。
需要注意的是,如果某个节点的槽被完全迁移出去,其对应的slave也会随着更新,指向迁移的目标节点。
[[email protected] ~]# /data/soft/redis-4.0.12/src/redis-trib.rb del-node 192.168.1.54:1986 a5fc7cdbbde8a09ec88cc5c63f0b4e74c8f2a43b
>>> Removing node a5fc7cdbbde8a09ec88cc5c63f0b4e74c8f2a43b from cluster 192.168.1.54:1986
/usr/local/rvm/gems/ruby-2.6.3/gems/redis-3.2.1/lib/redis/client.rb:443: warning: constant ::Fixnum is deprecated
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
[[email protected] ~]# /data/soft/redis-4.0.12/src/redis-trib.rb check 192.168.1.54:1986
/usr/local/rvm/gems/ruby-2.6.3/gems/redis-3.2.1/lib/redis/client.rb:443: warning: constant ::Fixnum is deprecated
[ERR] Sorry, can‘t connect to node 192.168.1.54:1986
[[email protected] ~]#
到此处已经成功的删除掉了先还有数据的redis-cluster中的节点实例。解决一开始出现的bug问题
原文地址:https://blog.51cto.com/wujianwei/2461513