How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference

1. Add Partition Tool

Partitions act as unit of parallelism. Messages of a single topic are distributed to multiple partitions that can be stored and served on different servers. Upon creation of a topic, the number of partitions for this topic has to be specified. Later on more partitions may be needed for this topic when the volume of this topic increases. This tool helps to add more partitions for a specific topic and also allow manual replica assignment of the added partitions. You can refer to the previous blog Quick steps : Have a Kafka Cluster Up & Running in 3 minutes to setup kafka cluster and create topics.


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

bin/kafka-add-partitions.sh

Option                                                    Description

------                                                       -----------

--partition <Integer: # of partitions>      REQUIRED: Number of partitions to add

to the topic

--replica-assignment-list                       For manually assigning replicas to

<broker_id_for_part1_replica1 :            brokers for the new partitions

broker_id_for_part1_replica2,              (default: )

broker_id_for_part2_replica1 :

broker_id_for_part2_replica2, ...>

--topic <topic>                                      REQUIRED: The topic for which

partitions need to be added.

--zookeeper <urls>                               REQUIRED: The connection string for

the zookeeper connection in the form

host:port. Multiple URLS can be

given to allow fail-over.

2. Reassign Partitions Tool

What does the tool do?

The goal of this tool is similar to the Referred Replica Leader Election Tool as to achieve load balance across brokers. But instead of only electing a new leader from the assigned replicas of a partition, this tool allows to change the assigned replicas of partitions – remember that followers also need to fetch from leaders in order to keep in sync, hence sometime only balance the leadership load is not enough.

A summary of the steps that the tool does is shown below -

1. The tool updates the zookeeper path "/admin/reassign_partitions" with the list of topic partitions and (if specified in the Json file) the list of their new assigned replicas.
2. The controller listens to the path above. When a data change update is triggered, the controller reads the list of topic partitions and their assigned replicas from zookeeper.
3. For each topic partition, the controller does the following:
3.1. Start new replicas in RAR - AR (RAR = Reassigned Replicas, AR = original list of Assigned Replicas)
3.2. Wait until new replicas are in sync with the leader
3.3. If the leader is not in RAR, elect a new leader from RAR
3.4 4. Stop old replicas AR - RAR
3.5. Write new AR
3.6. Remove partition from the /admin/reassign_partitions path

How to use the tool?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

bin/kafka-reassign-partitions.sh

bin/kafka-reassign-partitions.sh

Option                                                        Description

------                                                           -----------

--broker-list <brokerlist>                            The list of brokers to which the

partitions need to be reassigned in

the form "0,1,2". This is required

for automatic topic reassignment.

--execute [execute]                                   This option does the actual

reassignment. By default, the tool

does a dry run

--manual-assignment-json-file <manual                 The JSON file with the list of manual

assignment json file path>                          reassignmentsThis option or topics-

to-move-json-file needs to be

specified. The format to use is -

{"partitions":

[{"topic": "foo",

"partition": 1,

"replicas": [1,2,3] }],

"version":1

}

--topics-to-move-json-file <topics to                The JSON file with the list of topics

reassign json file path>                           to reassign.This option or manual-

assignment-json-file needs to be

specified. The format to use is -

{"topics":

[{"topic": "foo"},{"topic": "foo1"}],

"version":1

}

--zookeeper <urls>                                   REQUIRED: The connection string for

the zookeeper connection in the form

host:port. Multiple URLS can be

given to allow fail-over.

3.  Add Brokers(Cluster Expansion)

Cluster expansion involves including brokers with new broker ids in a Kafka 08 cluster. Typically, when you add new brokers to a cluster, they will not receive any data from existing topics until this tool is run to assign existing topics/partitions to the new brokers. The tool allows 2 options to make it easier to move some topics in bulk to the new brokers. These 2 options are a) topics to move b) list of newly added brokers. Using these 2 options, the tool automatically figures out the placements of partitions for the topics on the new brokers.

The following example moves 2 topics (foo1, foo2) to newly added brokers in a cluster (5,6,7).


1

2

3

4

5

6

7

> ./bin/kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json --broker-list "5,6,7" --execute

>  cat topics-to-move.json

{"topics":

[{"topic": "foo1"},{"topic": "foo2"}],

"version":1

}

Selectively moving some partitions to a broker

The partition movement tool can also be moved to selectively move some replicas for certain partitions over to a particular broker. Typically, if you end up with an unbalanced cluster, you can use the tool in this mode to selectively move partitions around. In this mode, the tool takes a single file which has a list of partitions to move and the replicas that each of those partitions should be assigned to.

The following example moves 1 partition (foo-1) from replicas 1,2,3 to 1,2,4


1

2

3

4

5

6

7

8

9

10

> ./bin/kafka-reassign-partitions.sh --manual-assignment-json-file partitions-to-move.json --execute

> cat partitions-to-move.json

{"partitions":

[{"topic": "foo",

"partition": 1,

"replicas": [1,2,4] }],

}],

"version":1

}

Note : These tools are available in version 0.8 , not prior versions.

Be Sociable, Share!

http://xebee.xebia.in/index.php/2014/12/04/how-to-manage-and-balance-huge-data-load-for-big-kafka-clusters/

时间: 2024-08-23 23:15:19

How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference的相关文章

Flat File Data Load

The applicationdoes not allow to overwrite any column or change the data type of existing data. The supported file types are: .xls  .csv  .xlsx pron:  quick and easy data load con:    No delta logic available , no transformation capabilities if a .xl

Use Goldengate Initial Data Load

我们在搭建ogg的时候,通常需要先对目标库进行初始化,那么初始化的方式很多.现在介绍ogg自己的初始化方式. 测试环境: Souce DB: OS:redhat 6.3 Name:zbdba1 DB:11.2.0.4 OGG:11.2.1.0.1 Target DB: OS:redhat 6.3 Name:zbdba2 DB:11.2.0.4 OGG:11.2.1.0.1 这里着重介绍 Initial Load,具体安装配置请参考: http://blog.csdn.net/zbdba/arti

kafaka quickstart

http://kafka.apache.org/ http://kafka.apache.org/downloads Quickstart This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Window

How Network Load Balancing Technology Works--reference

http://technet.microsoft.com/en-us/library/cc756878(v=ws.10).aspx In this section Network Load Balancing Terms and Definitions Network Load Balancing Architecture Network Load Balancing Protocols Application Compatibility with Network Load Balancing

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design earn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now! Answers for many valuable business questio

Managing Spark data handles in R

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list ordata_frame. Please read on for our handy hints on keeping your data handles neat. When using R to work over a big

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "stream processing", "event data", and "real-time", often related to technologies like Kafka, Storm, Samza, or Spark's Streaming module.

Android开发训练之第五章第三节——Transferring Data Without Draining the Battery

Transferring Data Without Draining the Battery GET STARTED DEPENDENCIES AND PREREQUISITES Android 2.0 (API Level 5) or higher YOU SHOULD ALSO READ Optimizing Battery Life In this class you will learn to minimize the battery life impact of downloads a

Using View and Data API with Meteor

By Daniel Du I have been studying Meteor these days, and find that Meteor is really a mind-blowing framework, I can talk about this latter. I was inspired by this question on forum and started to looking at the possibilities of using View and Data AP