【重拾】MapReducer[第一篇]

昨天听朋友说了一个题目，具体的题目忘了！有数据是这样的：

<1,0> 
<2,8>
<1,9>
<2,7>
<1,0>
<3,15>
<5,20>  
<3,25>
<4,20>
<3,50>

要得到结果试着样的：

对左侧数据的统计，对右侧数据的去重；当左侧相同时，右侧也相同，之记录一次；当左侧相同，右侧不同，左侧数据次数累加；当左侧不相同，右侧也不相同时候，左侧数据累加统计。

了解过大意以后发现这个就是对数据的去重统计的一个小测试！思路就不写了，跟着代码随意遐想，代码仅限上述情况：

package com.amir.test;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class MapReducer_MulTask {

    public static class MassRemovingMap extends MapReduceBase implements
            Mapper<Object, Text, Text, Text> {

        private Text line = new Text();

        public void map(Object key, Text value,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {
            line = value;
            output.collect(line, new Text(""));
        }
    }

    public static class MassRemovingReduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, Text> {

        public void reduce(Text key, Iterator<IntWritable> value,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {
            output.collect(key, new Text(""));
        }
    }

    public static class StatisticsMap extends MapReduceBase implements
            Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                String[] temp = itr.nextToken().split(",");
                String akey = temp[0].replace("<", "");
                word.set(akey);
                output.collect(word, one);
            }
        }
    }

    public static class StatisticsReduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {

        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterator<IntWritable> value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            int sum = 0;
            while (value.hasNext()) {
                IntWritable val = value.next();
                sum += val.get();
            }
            result.set(sum);
            output.collect(key, result);
        }

    }

    public static void TaskMassRemoving() throws IOException{
        String[] param = { "/test/testw/ss", "/test/testw/woutput" };
        Configuration conf = new Configuration();
        JobConf jobconf = new JobConf(conf, MapReducer_MulTask.class);
        jobconf.setJobName("TaskMassRemoving");
        
        jobconf.setJarByClass(MapReducer_MulTask.class);
        jobconf.setMapperClass(MassRemovingMap.class);
        jobconf.setCombinerClass(MassRemovingReduce.class);
        jobconf.setReducerClass(MassRemovingReduce.class);
        jobconf.setOutputKeyClass(Text.class);
        jobconf.setOutputValueClass(Text.class);
        
        FileInputFormat.addInputPath(jobconf, new Path(param[0]));
        FileOutputFormat.setOutputPath(jobconf, new Path(param[1]));
        JobClient.runJob(jobconf).waitForCompletion();
    }
    
    public static void TaskStatistics() throws IOException{
        String[] param = {"/test/testw/woutput/part-00000","/test/testw/woutput/wordcount"};
        Configuration conf = new Configuration();
        JobConf jobconf = new JobConf(conf, MapReducer_MulTask.class);
        jobconf.setJobName("TaskStatistics");
        
        jobconf.setJarByClass(MapReducer_MulTask.class);
        jobconf.setMapperClass(StatisticsMap.class);
        jobconf.setCombinerClass(StatisticsReduce.class);
        jobconf.setReducerClass(StatisticsReduce.class);
        
        jobconf.setOutputKeyClass(Text.class);
        jobconf.setOutputValueClass(IntWritable.class);
        
        FileInputFormat.addInputPath(jobconf, new Path(param[0]));
        FileOutputFormat.setOutputPath(jobconf, new Path(param[1]));
        JobClient.runJob(jobconf).waitForCompletion();
        
    }
    
    public static void main(String[] args) throws IOException {
        try {
            MapReducer_MulTask.TaskMassRemoving(); // 01
            MapReducer_MulTask.TaskStatistics();  // 02
            System.out.println("OK!");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

主要对MapReducer 基本使用的测试！！！！

时间： 2024-10-02 12:45:41

【重拾】MapReducer[第一篇]的相关文章

（二）重拾单片机第一天 LED灯

由图知道低电平亮,高电平灭控制第一个 LED1 亮灭程序代码,如下 #include<reg52.h> #define uchar8 unsigned char #define uint16 unsigned int sbit LED1=P1^0; /*****************************/ // 函数名称: DelayMS( ) // 函数功能: 延时 // 入口函数: 延时毫秒 // 出口函数: 无 /********* *******************/

重拾单片机第一天 LED灯

由图知道低电平亮,高电平灭控制第一个 LED1 亮灭程序代码,如下 1 #include<reg52.h> 2 #define uchar8 unsigned char 3 #define uint16 unsigned int 4 sbit LED1=P1^0; 5 /*****************************/ 6 // 函数名称: DelayMS( ) 7 // 函数功能: 延时 8 // 入口函数: 延时毫秒 9 // 出口函数: 无 10 /*********

重拾算法(2)——线索二叉树

重拾算法(2)——线索二叉树上一篇我们实现了二叉树的递归和非递归遍历,并为其复用精心设计了遍历方法Traverse(TraverseOrder order, NodeWorker<T> worker);今天就在此基础上实现线索二叉树. 什么是线索二叉树二叉树中容易找到结点的左右孩子信息,但该结点在某一序列中的直接前驱和直接后继只能在某种遍历过程中动态获得. 先依遍历规则把每个结点某一序列中对应的前驱和后继线索预存起来,这叫做"线索化". 意义:从任一结点出发都能快速找到

SaltStack 入门到精通 - 第一篇：安装SaltStack

实际环境的设定: 系统环境: centos6 或centos5 实验机器: 192.168.1.100 软件需求: salt 套件,及其需求环境实验目的: 成功安装salt,并实现salt主从间通讯特殊设置: 其它目的: 安装SaltStack(下面简称为salt) epel安装:salt安装需要epel源支持,所以在安装salt前需要先安装epel包 # centos5 下载下面rpm wget -O epel.rpm https://dl.fedoraproject.org/pu

U-BOOT-2016.07移植 (第一篇) 初步分析

U-BOOT-2016.07移植 (第一篇) 初步分析目录 U-BOOT-201607移植第一篇初步分析目录编译和移植环境更新交叉编译工具 1 下载arm-linux-gcc 443 2 安装arm-linux-gcc 443 安装环境Ubuntu 910 下载u-boot-201607并解压分析顶层Makefile 1 找出目标依赖关系 2 总结初次编译u-boot 1 配置 2 编译分析u-boot启动流程 1 分析startS 2 分析crt0S 3 总结 1. 编译和移

《LDAP服务器的配置》RHEL6——第一篇运维工程师必考

ldap这种原始的服务器搭建起来比较复杂,同时它也是CE必考的(客户端的搭建).过段时间再写客户端的搭建.加密.共享.第一章先搭建服务器端.. 1.安装openldap-servers软件包 2.查看ldap模板文件的存放位置: 3.拷贝ldap模板文件到配置文件目录并修改文件名为slapd.conf. 4.删除/etc/openldap目录下原有的文件,保留下这几个文件,注意:以前学时是要删除schema文件,直留下三个,但是我测试时如果删除schema服务将失败. 5.修改slapd.con

微型 ORM 的第一篇 DapperLambda发布

引言:因为接触过多个ORM,但使用的时候都遇到了各自的一些不够理想的地方,从最早开始开始公司自己分装的,到后面用EF,以及Dapper和DapperExtensions 到现在用的FluentData,就说说我自己的使用体验,在这几个相比之下,Dapper应该是最轻量级,而且性能也是最好的,但是相对比较简单了点.EF的最新版也没去使用,所以现在不是很了解,EF在这几个相比一下,功能是最强大的,但是启动加载慢,以及复杂的功能,后续人优化麻烦.FluentData 怎么说呢,用的都挺好用,而且语法

重拾梦想，做更好的自己

亥时,就寝,忽入空灵,甲申年出师已历一纪,诸多记忆电光石火逐一闪现.时年家贫无靠,生计无着,每日波奔却心系梦想,虽日日身疲体倦,却每以<孟子·告子下>篇中名句“天降大任于斯人也,必先苦其心志,劳其筋骨,饿其体肤,空乏其身,行指乱其所为,所以动心忍性,曾益其所不能也”以慰寸心,类比篇中清史名人,胸中满溢浩然正气,行事尽显峥嵘:历12载,生活稳定,已婚并育一女,四老体健而心宽,内子贤而持家,小女伶俐活泼,此三项尽得,可谓得意了. 然忆及往昔践行之路与现时行走之途,高下立判,原所行皆可日日前行,步步

重拾C/C++基础

1.复制指针时只复制指针中的地址,而不会复制指针指向的对象2.解决护栏柱错误的根本是从思想认知上搞定. 数组的序号为偏移量. 也即: 数组的第一个元素为arrName[0],其偏移量为03.使用strcpy函数时要注意,若是源串的长度大于目标串的长度,将会覆盖缓冲区后面的内容所以尽量使用strncpy来替代strcpy4.数组可以使一维或者是多维.只要数组包含的元素为内置类型或者有默认构造函数的类,就可以初始化5.数学运算符有5个: +(加) -(减) *(乘) /(除) %(求模)6