Storm在集群上运行一个Topology时,主要通过以下3个实体来完成Topology的执行工作:
1. Worker(进程)
2. Executor(线程)
3. Task
下图简要描述了这3者之间的关系:
上图和下面这张图一样
看看官网的对这部分的讲解
Example of a running topology
The following illustration shows how a simple topology would look like in operation. The topology consists of three components: one spout called BlueSpout
and two
bolts called GreenBolt
and YellowBolt
. The components are linked
such that BlueSpout
sends its output to GreenBolt
, which in turns
sends its own output to YellowBolt
.
The GreenBolt
was configured as per the code snippet above whereas BlueSpout
and YellowBolt
only
set the parallelism hint (number of executors). Here is the relevant code:
Config conf = new Config(); conf.setNumWorkers(2); // use two worker processes topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2 topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping("blue-spout"); topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6) .shuffleGrouping("green-bolt"); StormSubmitter.submitTopology( "mytopology", conf, topologyBuilder.createTopology() );
1个worker进程执行的是1个topology的子集(注:不会出现1个worker为多个topology服务)。1个worker进程会启动1个或多个executor线程来执行1个topology的component(spout或bolt)。因此,1个运行中的topology就是由集群中多台物理机上的多个worker进程组成的。
executor是1个被worker进程启动的单独线程。每个executor只会运行1个topology的1个component(spout或bolt)的task(注:task可以是1个或多个,storm默认是1个component只生成1个task,executor线程里会在每次循环里顺序调用所有task实例)。
task是最终运行spout或bolt中代码的单元(注:1个task即为spout或bolt的1个实例,executor线程在执行期间会调用该task的nextTuple或execute方法)。topology启动后,1个component(spout或bolt)的task数目是固定不变的,但该component使用的executor线程数可以动态调整(例如:1个executor线程可以执行该component的1个或多个task实例)。这意味着,对于1个component存在这样的条件:#threads<=#tasks(即:线程数小于等于task数目)。默认情况下task的数目等于executor线程数目,即1个executor线程只运行1个task。
那么一个excutor处理多个task,有什么用?一种理解的是可以方便以后扩容。首先要知道,topology代码一旦提交到nimbus上去之后,task数量随之而定,以后永不再改变,甚至重启topology,都不会再改变task数量,除非改代码,再重新提交。而设置并行度就不一样了,我们不需要重新提交代码,就可以修改topology的并发,可以随时修改。但一个executor必须要处理一个task,如果以前我们默认有4个executor,4个task,即一个executor处理一个task,好了,我现在感觉现在并发不够,处理速度跟不上,想调高一些并发,调为8个,但task数量只有4个,多出来的executor也只是闲着,所以调高并发也没卵用了。如果我设置task数量为8,一开始并发度为4,一个executor必须执行2个task,速度慢了,我把并发度改为8,一个executor只执行一个task,速度比刚刚快了