Storm Topology Parallelism

Understanding the Parallelism of a Storm Topology

What makes a running topology: worker processes, executors and tasks

在一个Strom集群中，实际运行一个topology有三个主要的实体

Worker processes
Executors (threads)
Tasks

下面是一张草图简单说明他们之间的关系：

A worker process executes a subset of a topology.

一个worker进程属于一个特定的topology并且可能运行一个或者多个executors

一个运行中的topology由运行在集群中的许多机器上的这样的进程组成

一个executor是被一个worker进程启动的一个线程。它可能运行一个或多个任务。

一个task执行实际的数据处理——在你的代码中实现的每一个spout或bolt执行许多任务。一个组件的任务数量总是不变的，这是自始至终贯穿整个topology的，但是一个组件的executors(threads)的数量是可以随时改变的。也就是说，下面这个表达式总是true：#threads ≤ #tasks。默认情况下，task的数量和executor的数量是相等的，也就是说每个线程运行一个任务。

Configuring the parallelism of a topology

注意，Storm中的术语"parallelism"也被叫做parallelism hint，表示一个组件初始的executor(threads)数量。

在这篇文档中我们将用"parallelism"来描述怎样配置executor的数量，怎样配置worker进程的数量，以及task的数量。

配置的方式有多种，它们之间的优先级顺序为：defaults.yaml < storm.yaml < topology-specific configuration < internal component-specific configuration < external component-specific configuration