首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 其他教程 > 其他相关 >

懂得storm的并行执行,workder,executor,task的关系以及调度算法

2013-12-05 
理解storm的并行执行,workder,executor,task的关系以及调度算法官方对storm中worker,executor,task讲解非

理解storm的并行执行,workder,executor,task的关系以及调度算法

官方对storm中worker,executor,task讲解非常清楚,https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology? 转载到个人博客上。一图胜千言:


懂得storm的并行执行,workder,executor,task的关系以及调度算法
?

?

Storm distinguishes between the following three main entities that are used to actually run a topology in a Storm cluster:

    Worker processesExecutors (threads)Tasks

Here is a simple illustration of their relationships:

懂得storm的并行执行,workder,executor,task的关系以及调度算法

A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.

An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).

A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.

Configuring the parallelism of a topology

Note that in Storm’s terminology "parallelism" is specifically used to describe the so-called parallelism hint, which means the initial number of executor (threads) of a component. In this document though we use the term "parallelism" in a more general sense to describe how you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when "parallelism" is used in the normal, narrow definition of Storm.

The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them. Storm currently has the following order of precedence for configuration settings: defaults.yaml < storm.yaml < topology-specific configuration < internal component-specific configuration < external component-specific configuration.

Number of worker processesDescription: How many worker processes to create for the topology across machines in the cluster.Configuration option: TOPOLOGY_WORKERSHow to set in your code (examples):Config#setNumWorkersNumber of executors (threads)Description: How many executors to spawn per component.Configuration option: ?How to set in your code (examples):TopologyBuilder#setSpout()TopologyBuilder#setBolt()Note that as of Storm 0.8 the parallelism_hint parameter now specifies the initial number of executors (not tasks!) for that bolt.Number of tasksDescription: How many tasks to create per component.Configuration option: TOPOLOGY_TASKSHow to set in your code (examples):ComponentConfigurationDeclarer#setNumTasks()

Here is an example code snippet to show these settings in practice:

The GreenBolt was configured as per the code snippet above whereas BlueSpout and YellowBolt only set the parallelism hint (number of executors). Here is the relevant code:

$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10References for this articleConceptsConfigurationRunning topologies on a production clusterLocal modeTutorialStorm API documentation, most notably the class Config

热点排行