024_MapReduce中的基类Mapper和基类Reducer

内容提纲

1） MapReduce中的基类Mapper类，自定义Mapper类的父类。

2） MapReduce中的基类Reducer类，自定义Reducer类的父类。

1、Mapper类

API文档

1） InputSplit输入分片，InputFormat输入格式化

2）对Mapper输出结果进行Sorted排序和Group分组

3）对Mapper输出结果依据Reducer个数进行分区Patition

4）对Mapper输出数据进行Combiner

在Hadoop官方文档的Mapper类说明：

　　Maps input key/value pairs to a set of intermediate key/value pairs.

　　Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.

　　The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job. Mapper implementations can access the Configuration for the job via the JobContext.getConfiguration().

　　The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, Object, Context) for each key/value pair in the InputSplit. Finally cleanup(Context) is called.

　　All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to a Reducer to determine the final output. Users can control the sorting and grouping by specifying two key RawComparator classes.

　　The Mapper outputs are partitioned per Reducer. Users can control which keys (and hence records) go to which Reducer by implementing a custom Partitioner.

　　Users can optionally specify a combiner, via Job.setCombinerClass(Class), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.

　　Applications can specify if and how the intermediate outputs are to be compressed and which CompressionCodecs are to be used via the Configuration.

If the job has zero reduces then the output of the Mapper is directly written to the OutputFormat without sorting by keys.

Mapper类的结构：

方法如下：

第一类：protected类型，用户根据实际需要进行覆写。

1） setup：每个任务执行前调用一次。

2） map：每个Key/Value对调用一次。

3） clearup：每个任务执行结束前调用一次。

第二类，运行的方法

run()方法，是Mapper类的入口，方法内部调用了setup()、map()、clearup()三个方法。

时间： 2024-10-13 01:08:38

024_MapReduce中的基类Mapper和基类Reducer

内容提纲

1、Mapper类

024_MapReduce中的基类Mapper和基类Reducer的相关文章

基类中定义的虚函数在派生类中重新定义时，其函数原型，包括返回类型、函数名、参数个数、参数类型及参数的先后顺序，都必须与基类中的原型完全相同 but------> 可以返回派生类对象的引用或指针

【Android进阶】为什么要创建Activity基类以及Activity基类中一般有哪些方法

基类成员在派生类中的访问属性——总结

C++的继承操作---基类指针访问派生类问题---基类成员恢复访问属性问题

修改tt模板让ADO.NET C# POCO Entity Generator With WCF Support 生成的实体类继承自定义基类

派生类地址比基类地址少4（子类与基类指针强行转换的时候，值居然会发生变化，不知道Delphi BCB是不是也这样） good

派生类(多级)到基类转换的可访问性

第十一周项目3 - 点类派生直线类】定义点类Point，并以点类为基类，继承关系

基类指针和派生类指针的使用总结