Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由

Red Cluster!

摘自:http://blog.kiyanpro.com/2016/03/06/elasticsearch/reroute-unassigned-shards/

There are 3 cluster states:

  1. green: All primary and replica shards are active
  2. yellow: All primary shards are active, but not all replica shards are active
  3. red: Not all primary shards are active

When cluster health is red, it means cluster is dead. And that means you can do nothing until it’s recovered, which is very bad indeed. I will share with you how to deal with one common situation: when cluster is red due to unassigned shards.

Steps

The general idea is pretty simple: find those shards which are unassigned, manually assign them to a node with reroute API. Let’s see how we can do that step by step. Then we can combine them into a configurable simple script.

Step 1: Check Unassigned Shards

To get cluster information, we usually use cat APIs. There is a GET /_cat/shards endpoint to show a detailed view of what nodes contain which shards[1].

Cat shards


1

2

3

4

5

6

7

8

9


# cat shards verbose

curl "http://your.elasticsearch.host.com:9200/_cat/shards?v"

# cat shards index

curl "http://your.elasticsearch.host.com:9200/_cat/shards/wiki2"

# example return

# wiki2 0 p STARTED 197 3.2mb 192.168.56.10 Stiletto

# wiki2 1 p STARTED 205 5.9mb 192.168.56.30 Frankie Raye

# wiki2 2 p STARTED 275 7.8mb 192.168.56.20 Commander Kraken

By piping cat shards to fgrep, we can get all unassigned shards.

Get unassigned shards


1

2

3

4

5

6


# cat shards with fgrep

curl "http://your.elasticsearch.host.com:9200/_cat/shards" | fgrep UNASSIGNED

# example return

# wiki1 0 r UNASSIGNED ALLOCATION_FAILED

# wiki1 1 r UNASSIGNED ALLOCATION_FAILED

# wiki1 2 r UNASSIGNED ALLOCATION_FAILED

If you don’t want to deal with shell script, you can also find these unassigned shards using another endpoint POST /_flush/synced[2]. This endpoint is actually not just some information. It allows an administrator to initiate a synced flush manually. This can be particularly useful for a planned (rolling) cluster restart where you can stop indexing and don’t want to wait the default 5 minutes for idle indices to be sync-flushed automatically. It returns with a json response.

_flush/synced


1

curl -XPOST "http://your.elasticsearch.host.com:9200/twitter/_flush/synced"

If there are failed shards in the response, we can iterate through a failures array to get all unassigned ones.

Example response with failed shards


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26


{

"_shards": {

"total": 4,

"successful": 1,

"failed": 1

},

"twitter": {

"total": 4,

"successful": 3,

"failed": 1,

"failures": [

{

"shard": 1,

"reason": "unexpected error",

"routing": {

"state": "STARTED",

"primary": false,

"node": "SZNr2J_ORxKTLUCydGX4zA",

"relocating_node": null,

"shard": 1,

"index": "twitter"

}

}

]

}

}

Step 2: Reroute

The reroute command allows to explicitly execute a cluster reroute allocation command including specific commands[3] . An unassigned shard can be explicitly allocated on a specific node.

Reroute example


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15


curl -XPOST ‘localhost:9200/_cluster/reroute‘ -d ‘{

"commands" : [ {

"move" :

{

"index" : "test", "shard" : 0,

"from_node" : "node1", "to_node" : "node2"

}

},

{

"allocate" : {

"index" : "test", "shard" : 1, "node" : "node3"

}

}

]

}‘

There are 3 kinds of commands you can use:

move: Move a started shard from one node to another node. Accepts index and shard for index name and shard number, from_node for the node to move the shard from, and to_node for the node to move the shard to.

cancel: Cancel allocation of a shard (or recovery). Accepts index and shard for index name and shard number, and node for the node to cancel the shard allocation on. It also accepts allow_primary flag to explicitly specify that it is allowed to cancel allocation for a primary shard. This can be used to force resynchronization of existing replicas from the primary shard by cancelling them and allowing them to be reinitialized through the standard reallocation process.

allocate: Allocate an unassigned shard to a node. Accepts the index and shard for index name and shard number, and node to allocate the shard to. It also accepts allow_primary flag to explicitly specify that it is allowed to explicitly allocate a primary shard (might result in data loss).

Combining step 2 with the unassigned shards from Step 1, we can reroute all unassigned shards 1 by 1, thus getting faster cluster recovery from red state.

Example Solutions

Python

Below is a python script I wrote using POST /_flush/synced and POST /reroute

Shell Script

Below is a shell script I found elsewhere in a blog post[4]


1

2

3

4

5

6

7

8

9

10

11

12

13

14


for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk ‘{print $2}‘); do

curl -XPOST ‘localhost:9200/_cluster/reroute‘ -d ‘{

"commands" : [ {

"allocate" : {

"index" : "t37", # index name

"shard" : $shard,

"node" : "datanode15", # node name

"allow_primary" : true

}

}

]

}‘

sleep 5

done

EDIT: Based on Vincent’s comment I updated the shell script:

Possible Unassigned Shard Reasons

FYI, these are the possible reasons for a shard be in a unassigned state[1]:

Name Comment
INDEX_CREATED Unassigned as a result of an API creation of an index
CLUSTER_RECOVERED Unassigned as a result of a full cluster recovery
INDEX_REOPENED Unassigned as a result of opening a closed index
DANGLING_INDEX_IMPORTED Unassigned as a result of importing a dangling index
NEW_INDEX_RESTORED Unassigned as a result of restoring into a new index
EXISTING_INDEX_RESTORED Unassigned as a result of restoring into a closed index
REPLICA_ADDED Unassigned as a result of explicit addition of a replica
ALLOCATION_FAILED Unassigned as a result of a failed allocation of the shard
NODE_LEFT Unassigned as a result of the node hosting it leaving the cluster
REROUTE_CANCELLED Unassigned as a result of explicit cancel reroute command
REINITIALIZED When a shard moves from started back to initializing, for example, with shadow replicas
REALLOCATED_REPLICA A better replica location is identified and causes the existing replica allocation to be cancelled

References

  1. ElasticSearch Document Cat Shards
  2. ElasticSearch Document Synced Flush
  3. ElasticSearch Document Cluster Reroute
  4. How to fix your elasticsearch cluster stuck in initializing shards mode?
时间: 2024-11-03 20:56:27

Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由的相关文章

eclipse 导出jar 没有主清单属性的解决方法

eclipse编写导出的jar文件,运行出现了没有主清单属性,问题在哪里呢?有下面几种方法: 1. 导出jar文件的时候选择[可运行的jar文件]而不是[Jar文件]即可,如下图: 2. 在jar文件包的MAINFIEST.MF文件,添加一行[Main-Class: XXX],其中XXX为主类名,注意XXX之前有一个空格,否则出现压缩包错误: 3. 在eclipse工程目录下编辑MAINFIEST.MF文件,然后导出的时候选择导出jar文件的时候选择[从工作空间中使用现有清单]即可,如下图:

JAVA之中出现无法加载主类的情况解决方法

j今天打代码的时候出现了无法加载主类的情况,我就收集了一些,java无法加载主类的方法 ava无法加载主类解决办法 今天启动项目,又遇到找不到或无法加载主类的情况,清除项目后无法编译,class文件下没有.class文件,至少遇到3次这个问题了,隔一段时间就出现这个问题,而且每次解决的方法都还不相同,这个问题的标识就是项目上有红色差号或者叹号,原因诸多也说不清楚,有的时候是jar包缺或者引入了无用的jar包,有时候开发软件编译停顿了反应慢等等,今天特意总结解决此问题的方法,和大家分享一下: 1.

数据库表操作时出去死锁或卡主,最好的解决方法。。。。

就是利用可视化工具,先复制这张表,然后在删除卡死的那张表,最后将复制的表重命名即可. 出现卡死的原因可能是: 1.频繁的对某张表的字段进行操作,比如修改他的大小或数据类型啥的,可能就会导致出现锁表或卡死的状态. 2.若发现对某张表进行的某个字段进行操作时,卡死了,先去试试其他字段,或者其他表的字段看可不可以修改,若可以修改,怎就按照上面的办法,先复制,在删除,最后重命名. 还有一种方法: 利用可视化工具Navicat,按F6进行命令行,然后使用命令查看进程: >show full process

How to resolve unassigned shards in Elasticsearch——写得非常好

How to resolve unassigned shards in Elasticsearch 转自:https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ In Elasticsearch, a healthy cluster is a balanced cluster: primary and replica shards are distributed across all nodes for durable re

Recovering unassigned shards on elasticsearch 2.x——副本shard可以设置replica为0在设置回来

Recovering unassigned shards on elasticsearch 2.x 摘自:https://z0z0.me/recovering-unassigned-shards-on-elasticsearch/ I got accross the problem when decided to add a node to the elasticsearch cluster and that node was not able to replicate the indexes

ES shard unassigned的解决方法汇总

说下shard出现的几个状态说明: relocating_shards shows the number of shards that are currently moving from one node to another node(现网中遇到,因为kill -9重启es的方法不对,导致node下线,集群重新分配shard). This number is often zero, but can increase when Elasticsearch decides a cluster is

Java编译时报错“错误: 找不到或无法加载主类”的解决方法

一.问题描述 java在执行的时候,会遇到这样的报错,编译可以正常通过,只是执行java命令时会报错,例如: G:\1\JavaPrac\tankproject\src\tank1>javac TankGame1.java G:\1\JavaPrac\tankproject\src\tank1>java TankGame1错误: 找不到或无法加载主类 TankGame1 二.解决方法 主要是配置环境变量可能有问题,着重检查环境变量. 环境变量配置: 第一步: "变量名":J

Java的cmd配置(也即Java的JDK配置及相关常用命令)——找不到或无法加载主类 的解决方法

Java的cmd配置(也即Java的JDK配置及相关常用命令) ——找不到或无法加载主类  的解决方法 这段时间一直纠结于cmd下Java无法编译运行的问题.主要问题描述如下: javac 命令可以正常运行,而java命令有时可以正常运行,有时却不可以,不可以的症状就是显示“找不到或无法加载主类”. 在网上找了各种资料,最终发现是因为代码里含有package语句,所以无法在cmd下运行java命令解释.class文件.而那些可以运行的文件就是因为没有使用package语句. 我这种问题的解决方法

主键唯一键重复插入解决方法

[MySQL日记]主键唯一键重复插入解决方法 我们插入数据的时候,有可能碰到重复数据插入的问题,但是这些数据又是不被允许有重复值: ? 1 CREATE TABLE stuInfo ( id INT NOT NULL COMMENT '序号', name VARCHAR(20) NOT NULL DEFAULT '' COMMENT '姓名', age INT NOT NULL DEFAULT 0 COMMENT '年龄', PRIMARY KEY (id), UNIQUE KEY uniq_n