MapReduce 图解流程

Anatomy of a MapReduce Job

In MapReduce, a YARN application is called a Job. The implementation of the Application Master provided by the MapReduce
framework is called MRAppMaster.

Timeline of a
MapReduce Job

This
is the timeline of a MapReduce Job execution:

  • Map Phase: several Map Tasks are executed
  • Reduce Phase: several Reduce Tasks are executed

Notice that the Reduce Phase may start before the end of Map Phase. Hence, an interleaving between them is possible.

Map Phase

We now focus our discussion on the Map Phase. A key decision is how many MapTasks the Application Master needs to start for the current job.

What does the user
give us?

Let’s take a step back. When a client submits an application, several kinds of information are provided to the YARN infrastucture. In particular:

  • a configuration: this may be partial (some parameters are not specified by the user) and in this case the default values are used for the job. Notice that these default values may be the ones chosen by a Hadoop provider
    like Amanzon.
  • a JAR containing:
    • map() implementation
    • a combiner implementation
    • reduce() implementation
  • input and output information:
    • input directory: is the input directory on HDFS? On S3? How many files?
    • output directory: where will we store the output? On HDFS? On S3?

The number of files inside the input directory is used for deciding the number of Map Tasks of a job.

How many Map Tasks?

The Application Master will launch one MapTask for each map split. Typically, there is a map split for each input file. If the input file is too big (bigger than the HDFS block size) then
we have two or more map splits associated to the same input file. This is the pseudocode used inside the method getSplits() of
the FileInputFormat class:

num_splits = 0
for each input file f:
   remaining = f.length
   while remaining / split_size > split_slope:
      num_splits += 1
      remaining -= split_size

where:

split_slope = 1.1
split_size =~ dfs.blocksize

Notice that the configuration parameter mapreduce.job.maps is
ignored in MRv2 (in the past it was just an hint).

MapTask Launch

The MapReduce Application Master asks to the Resource Manager for Containers needed by the Job: one MapTask container request for each MapTask (map split).

A container request for a MapTask tries to exploit data locality of the map split. The Application Master asks for:

  • a container located on the same Node Manager where the map split is stored (a map split may be stored on multiple nodes due to the HDFS replication factor);
  • otherwise, a container located on a Node Manager in the same rack where the the map split is stored;
  • otherwise, a container on any other Node Manager of the cluster

This is just an hint to the Resource Scheduler. The Resource Scheduler is free to ignore data locality if the suggested assignment is in conflict with the Resouce Scheduler’s goal.

When a Container is assigned to the Application Master, the MapTask is launched.

Map
Phase: example of an execution scenario

This is a possible execution scenario of the Map Phase:

  • there are two Node Managers: each Node Manager has 2GB of RAM (NM capacity) and each MapTask requires 1GB, we can run in parallel 2 containers on each Node Manager (this is the best scenario, the Resource Scheduler may decide
    differently)
  • there are no other YARN applications running in the cluster
  • our job has 8 map splits (e.g., there are 7 files inside the input directory, but only one of them is bigger than the HDFS block size so we split it into 2 map splits): we need to run 8 Map Tasks.

Map Task Execution
Timeline

Let’s
now focus on a single Map Task. This is the Map Task execution timeline:

  • INIT phase: we setup the Map Task
  • EXECUTION phase: for each (key, value) tuple inside the map split we run the map() function
  • SPILLING phase: the map output is stored in an in-memory buffer; when this buffer is almost full then we start
    (in parallel) the spilling phase in order to remove data from it
  • SHUFFLE phase: at the end of the spilling phase, we merge all the map outputs and package them for the reduce phase

MapTask: INIT

During the INIT phase, we:

  1. create a context (TaskAttemptContext.class)
  2. create an instance of the user Mapper.class
  3. setup the input (e.g., InputFormat.classInputSplit.classRecordReader.class)
  4. setup the output (NewOutputCollector.class)
  5. create a mapper context (MapContext.classMapper.Context.class)
  6. initialize the input, e.g.:
  7. create a SplitLineReader.class object
  8. create a HdfsDataInputStream.class object

MapTask: EXECUTION

The EXECUTION phase is performed by the run method
of the Mapper class. The user can override it, but by default it will start by calling the setup method:
this function by default does not do anything useful but can be override by the user in order to setup the Task (e.g., initialize class variables). After the setup, for each <key, value> tuple contained in the map split, the map() is
invoked. Therefore, map() receives: a key a value, and a mapper context. Using the context, a map stores
its output to a buffer.

Notice that the map split is fetched chuck by chunk (e.g., 64KB) and each chunk is split in several (key, value) tuples (e.g., using SplitLineReader.class).
This is done inside the Mapper.Context.nextKeyValue method.

When the map split has been completely processed, the run function
calls the clean method: by default, no action is performed but the user may decide to override
it.

MapTask: SPILLING

As seen in the EXECUTING phase, the map will
write (using Mapper.Context.write()) its output into a circular in-memory buffer (MapTask.MapOutputBuffer).
The size of this buffer is fixed and determined by the configuration parameter mapreduce.task.io.sort.mb (default:
100MB).

Whenever this circular buffer is almost full (mapreduce.map.
sort.spill.percent
: 80% by default), the SPILLING phase is performed (in parallel using a separate thread). Notice that if the splilling thread is too slow and the buffer is 100% full, then the map() cannot
be executed and thus it has to wait.

The SPILLING thread performs the following actions:

  1. it creates a SpillRecord and FSOutputStream (local
    filesystem)
  2. in-memory sorts the used chunk of the buffer: the output tuples are sorted by (partitionIdx, key) using a quicksort algorithm.
  3. the sorted output is split into partitions: one partition for each ReduceTask of the job (see later).
  4. Partitions are sequentially written into the local file.
How Many Reduce Tasks?

The number of ReduceTasks for the job is decided by the configuration parameter mapreduce.job.reduces.

What
is the partitionIdx associated to an output tuple?

The paritionIdx of an output tuple is the index of a partition. It is decided inside the Mapper.Context.write():

partitionIdx = (key.hashCode() & Integer.MAX_VALUE) % numReducers

It is stored as metadata in the circular buffer alongside the output tuple. The user can customize the partitioner by setting the configuration parameter mapreduce.job.partitioner.class.

When do we apply
the combiner?

If the user specifies a combiner then the SPILLING thread, before writing the tuples to the file (4), executes the combiner on the tuples contained in each partition. Basically, we:

  1. create an instance of the user Reducer.class (the one specified
    for the combiner!)
  2. create a Reducer.Context: the output will be stored on the
    local filesystem
  3. execute Reduce.run(): see Reduce Task description

The combiner typically use the same implementation of the standard reduce() function
and thus can be seen as a local reducer.

MapTask: end of EXECUTION

At the end of the EXECUTION phase, the SPILLING thread is triggered for the last time. In more detail, we:

  1. sort and spill the remaining unspilled tuples
  2. start the SHUFFLE phase

Notice that for each time the buffer was almost full, we get one spill file (SpillReciord +
output file). Each Spill file contains several partitions (segments).

MapTask: SHUFFLE

Reduce Phase

[…]

YARN and MapReduce
interaction

时间: 2024-10-31 05:29:54

MapReduce 图解流程的相关文章

MapReduce 图解流程超详细解答(1)-【map阶段】

转自:http://www.open-open.com/lib/view/open1453097241308.html 在MapReduce中,一个YARN  应用被称作一个job, MapReduce 框架提供的应用,master的一个实现被称作MRAppMaster MapReduce Job的时间线 MapReduce Job  运行的时间线: Map Phase:若干 Map Tasks 被执行 Reduce Phase: 若干Reduce Tasks 被执行 reduce可能会在map

MapReduce&amp;#160;图解流程

Anatomy of a MapReduce Job In MapReduce, a YARN application is called a Job. The implementation of the Application Master provided by the MapReduce framework is called MRAppMaster. Timeline of a MapReduce Job This is the timeline of a MapReduce Job e

HP LoadRunner_12.55下载与安装图解流程

HP LoadRunner_12.55下载与安装图解流程 LoadRunner,是一种预测系统行为和性能的负载测试工具.通过以模拟上千万用户实施并发负载及实时性能监测的方式来确认和查找LoadRunner能够对整个企业架构进行测试.通过使用LoadRunner,企业能最大限度地缩短测试时间,优化性能和加速应用系统的发布周期.LoadRunner是一种适用于各种体系架构的自动负载测试工具,它能预测系统行为并评估系统性能. 工具/原料 LoadRunner是Mercury Interactive C

大数据技术之_05_Hadoop学习_02_MapReduce_MapReduce框架原理+InputFormat数据输入+MapReduce工作流程(面试重点)+Shuffle机制(面试重点)

第3章 MapReduce框架原理3.1 InputFormat数据输入3.1.1 切片与MapTask并行度决定机制3.1.2 Job提交流程源码和切片源码详解3.1.3 FileInputFormat切片机制3.1.4 CombineTextInputFormat切片机制3.1.5 CombineTextInputFormat案例实操3.1.6 FileInputFormat实现类3.1.7 KeyValueTextInputFormat使用案例3.1.8 NLineInputFormat使

Hadoop二次排序及MapReduce处理流程实例详解

一.概述 MapReduce框架对处理结果的输出会根据key值进行默认的排序,这个默认排序可以满足一部分需求,但是也是十分有限的,在我们实际的需求当中,往往有要对reduce输出结果进行二次排序的需求.对于二次排序的实现,网络上已经有很多人分享过了,但是对二次排序的实现原理及整个MapReduce框架的处理流程的分析还是有非常大的出入,而且部分分析是没有经过验证的.本文将通过一个实际的MapReduce二次排序的例子,讲述二次排序的实现和其MapReduce的整个处理流程,并且通过结果和Map.

MapReduce运行流程分析

研究MapReduce已经有一段时间了.起初是从分析WordCount程序开始,后来开始阅读Hadoop源码,自认为已经看清MapReduce的运行流程.现在把自己的理解贴出来,与大家分享,欢迎纠错. 还是以最经典的WordCount程序作为基础,来分析map阶段.reduce阶段和最复杂的shuffle阶段. 文本1:hello world                                      文本2:map reduce hello hadoop            

016_笼统概述MapReduce执行流程结合wordcount程序

一.map任务处理 1 .读取输入文件内容,解析成key.value对.对输入文件的每一行,解析成key.value对.每一个键值对调用一次map函数. 2 .写自己的逻辑,对输入的key.value处理,转换成新的key.value输出.3. 对输出的key.value进行分区.4 .对不同分区的数据,按照key进行排序.分组.相同key的value放到一个集合中.5 .(可选)分组后的数据进行归约. 二.reduce任务处理 1.对多个map任务的输出,按照不同的分区,通过网络copy到不同

MapReduce处理流程

MapReduce是Hadoop2.x的一个计算框架,利用分治的思想,将一个计算量很大的作业分给很多个任务,每个任务完成其中的一小部分,然后再将结果合并到一起.将任务分开处理的过程为map阶段,将每个小任务的结果合并到一起的过程为reduce阶段.下面先从宏观上介绍一下客户端提交一个作业时,Hadoop2.x各个组件之间的联系及处理流程.然后我们再具体看看MapReduce计算框架在执行一个作业时,做了些什么. YARN YARN是Hadoop2.x框架下的资源管理系统,其组成部分为: 1)全局

MapReduce执行流程

角色描述:JobClient:执行任务的客户端JobTracker:任务调度器TaskTracker:任务跟踪器Task:具体的任务(Map OR Reduce) 从生命周期的角度来看,mapreduce流程大概经历这样几个阶段:初始化.分配.执行.反馈.成功与失败的后续处理 每个阶段所做的事情大致如下 任务初始化 1.JobClient对数据源进行切片切片信息由InputSplit对象封装,接口定义如下: [java] view plaincopy public interface Input