Storm 调优是非常重要的, 仅次于写出正确的代码, 好在Storm官网上有关于worker executors tasks的介绍, http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
这篇文章是收录自这个blog: http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
将此翻译一下, 加强一下认识:
What makes a running topology: worker processes, executors and tasks
A worker process executes a subset of a topology, and runs in its own JVM. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.
An executor is a thread that is spawned by a worker process and runs within the worker’s JVM. An executor may run one or more tasks for the same component (spout or bolt). An executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor.
一个worker进程负责执行一个Topology的子集, 有自己独立的JVM, 一个worker进程属于一个特定的Topology, 运行着一到多个storm的组件(spouts或者bolts)的线程executors.一个运行中的Topology包含许多这样的进程, 分布在storm集群的不同的物理机器上.
一个executors是一个由worker进行产生, 并运行在worker JVM的线程. 一个executor 可能跑着同一个storm组件(spout或者bolt)的一个或者多个tasks, 一个executor 总是只有一个线程用于执行当前任务, 也就意味着executor中的任务是串行的.
A task performs the actual data processing and is run within its parent executor’s thread of execution. Each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads <= #tasks
. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread (which is usually what you want anyways).
Also be aware that:
- The number of executor threads can be changed after the topology has been started (see
storm rebalance
command below). - The number of tasks of a topology is static.
See Understanding the Internal Message Buffers of Storm for another view on the various threads that are running within the lifetime of a worker process and its associated executors and tasks.
一个task用于处理数据和执行, 每一个spout或者bolt都在集群上有许多tasks, 一个组件中task的数目总是跟这个Topology的吞吐量相同, 但是executors(threads)的数目是可以动态调整的. 这就意味着 threads<=tasks. 默认情况下, tasks=threads, storm将会为每一个task分配一个executor, 这也是用户想要的情况
注意:
1. executors threads的数目是可以在Topology启动后变动(storm rebanlance)
2. Topology的tasks数据是静态的.
可通过这篇blog Understanding the Internal Message Buffers of Storm 从另一个角度来看worker executors tasks, 稍后翻译
To be continued...