[Storm] 并发度的理解

Tasks & executors relation

Q1. However I‘m a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? An executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

A1: Yes, and yes

task只是某个component（spout或者bolt）的实例。Executor线程在执行期间会调用该task的nextTuple或execute方法

Q2. Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ?

A2: Running more than one task per executor does not increase the level of parallelism -- an executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor.

运行多个task并不会增加并行度，因为一个executor只是一个线程，这意味着它会顺序执行所有的task

The number of executor threads can be changed after the topology has been started (see storm rebalance command).
The number of tasks of a topology is static.

And by definition, there is the invariant of #executors <= #tasks.

一个topology的task个数是固定的，但是executor数（线程数）是可以动态改变的。默认的，executor数 <= tasks数

So one reason for having 2+ tasks per executor thread is to give you the flexibility to expand/scale up the topology through the storm rebalance command in the future without taking the topology offline. For instance, imagine you start out with a Storm cluster of 15 machines but already know that next week another 10 boxes will be added. Here you could opt for running the topology at the anticipated parallelism level of 25 machines already on the 15 initial boxes (which is, of course, slower than 25 boxes). Once the additional 10 boxes are integrated you can then storm rebalancethe topology to make full use of all 25 boxes without any downtime.

Another reason to run 2+ tasks per executor is for (primarily functional) testing. For instance, if your dev machine or CI server is only powerful enough to run, say, 2 executors alongside all the other stuff running on the machine, you can still run 30 tasks (here: 15 per executor) to see whether code such as your custom Storm grouping is working as expected.

一个executor运行2+task数的情况通常有：

为了给topology运行提供多大的灵活度，在运行中可以扩展并发度
为了功能测试

In practice we normally we run 1 task per executor.

PS: Note that Storm will actually spawn a few more threads behind the scenes. For instance, each executor has its own "send thread" that is responsible for handling outgoing tuples. There are also "system-level" background threads for e.g. acking tuples that run alongside "your" threads. IIRC the Storm UI counts those acking threads in addition to "your" threads.

实际上我们通常是 executors数 = task数

Reference

http://stackoverflow.com/questions/17257448/what-is-the-task-in-storm-parallelism

http://www.cnblogs.com/yufengof/p/storm-worker-executor-task.html

http://storm.apache.org/releases/0.9.6/Understanding-the-parallelism-of-a-Storm-topology.html

时间： 2025-01-08 03:55:00

[Storm] 并发度的理解

Tasks & executors relation

Reference

[Storm] 并发度的理解的相关文章

Storm并发度和Grouping方式

关于Storm 中Topology的并发度的理解

Storm并发度详解

Storm并发度

Storm基本概念以及Topology的并发度

storm并发机制，通信机制，任务提交

storm源码之理解Storm中Worker、Executor、Task关系【转】

对JAVA多线程并发编程的理解

算法复杂度的理解