Hadoop读书笔记(七)MapReduce 0.x版本API使用demo

Hadoop读书笔记(一)Hadoop介绍http://blog.csdn.net/caicongyang/article/details/39898629

Hadoop读书笔记(二)HDFS的shell操作http://blog.csdn.net/caicongyang/article/details/41253927

Hadoop读书笔记(三)Java
API操作HDFS
http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(四)HDFS体系结构 :http://blog.csdn.net/caicongyang/article/details/41322649

Hadoop读书笔记(五)MapReduce统计单词demohttp://blog.csdn.net/caicongyang/article/details/41453579

Hadoop读书笔记(六)MapReduce自定义数据类型demohttp://blog.csdn.net/caicongyang/article/details/41490379

1.说明

功能和上篇一样实现手机流量的统计(ps:可以与前面文章代码做对比)

2.代码:

KpiApp.java

package old;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.net.URI;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapred.lib.HashPartitioner;
/**
 *
 * <p>
 * Title: KpiApp.java
 * Package old
 * </p>
 * <p>
 * Description:  hadoop版本1.x的包一般是mapreduce
 * 		 hadoop版本0.x的包一般是mapred
 * <p>
 * @author Tom.Cai
 * @created 2014-11-25 下午10:23:47
 * @version V1.0
 *
 */
public class KpiApp {
	private static final String INPUT_PATH = "hdfs://192.168.80.100:9000/wlan";
	private static final String OUT_PATH = "hdfs://192.168.80.100:9000/wlan_out";
	/**
	 * 改动:
	 * 1.不再使用Job,而是使用JobConf
	 * 2.类的包名不再使用mapreduce,而是使用mapred
	 * 3.不再使用job.waitForCompletion(true)提交作业,而是使用JobClient.runJob(job);
	 *
	 */
	public static void main(String[] args) throws Exception {
		FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), new Configuration());
		Path outPath = new Path(OUT_PATH);
		if (fileSystem.exists(outPath)) {
			fileSystem.delete(outPath, true);
		}

		JobConf job = new JobConf(new Configuration(), KpiApp.class);
		FileInputFormat.setInputPaths(job, INPUT_PATH);
		job.setInputFormat(TextInputFormat.class);

		job.setMapperClass(KpiMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(KpiWite.class);

		job.setPartitionerClass(HashPartitioner.class);
		job.setNumReduceTasks(1);

		job.setReducerClass(KpiReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(KpiWite.class);

		FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));
		job.setOutputFormat(TextOutputFormat.class);

		JobClient.runJob(job);

	}
	/**
	 * 新api:extends Mapper
	 * 老api:extends MapRedcueBase implements Mapper
	 */
	static class KpiMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, KpiWite> {
		@Override
		public void map(LongWritable key, Text value, OutputCollector<Text, KpiWite> out, Reporter arg3) throws IOException {
			String[] splited = value.toString().split("\t");
			String num = splited[1];
			KpiWite kpi = new KpiWite(splited[6], splited[7], splited[8], splited[9]);
			out.collect(new Text(num), kpi);
		}

	}

	static class KpiReducer extends MapReduceBase implements Reducer<Text, KpiWite, Text, KpiWite> {
		@Override
		public void reduce(Text key, Iterator<KpiWite> value, OutputCollector<Text, KpiWite> out, Reporter arg3) throws IOException {
			long upPackNum = 0L;
			long downPackNum = 0L;
			long upPayLoad = 0L;
			long downPayLoad = 0L;
			while (value.hasNext()) {
				upPackNum += value.next().upPackNum;
				downPackNum += value.next().downPackNum;
				upPayLoad += value.next().upPayLoad;
				downPayLoad += value.next().downPayLoad;
			}
			out.collect(key, new KpiWite(String.valueOf(upPackNum), String.valueOf(downPackNum), String.valueOf(upPayLoad), String.valueOf(downPayLoad)));
		}
	}

}

class KpiWite implements Writable {
	long upPackNum;
	long downPackNum;
	long upPayLoad;
	long downPayLoad;

	public KpiWite() {
	}

	public KpiWite(String upPackNum, String downPackNum, String upPayLoad, String downPayLoad) {
		this.upPackNum = Long.parseLong(upPackNum);
		this.downPackNum = Long.parseLong(downPackNum);
		this.upPayLoad = Long.parseLong(upPayLoad);
		this.downPayLoad = Long.parseLong(downPayLoad);
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.upPackNum = in.readLong();
		this.downPackNum = in.readLong();
		this.upPayLoad = in.readLong();
		this.downPayLoad = in.readLong();
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeLong(upPackNum);
		out.writeLong(downPackNum);
		out.writeLong(upPayLoad);
		out.writeLong(downPayLoad);
	}

}

欢迎大家一起讨论学习!

有用的自己收!

记录与分享,让你我共成长!欢迎查看我的其他博客;我的博客地址:http://blog.csdn.net/caicongyang

时间: 2024-10-17 22:51:41

Hadoop读书笔记(七)MapReduce 0.x版本API使用demo的相关文章

Hadoop读书笔记(八)MapReduce 打成jar包demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(六)MapReduce自定义数据类型demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(十四)MapReduce中TopK算法(Top100算法)

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 (系列文章会逐步修整完成,添加数据文件格式预计相关注释) 1.说明: 从给定的文件中的找到最大的100个值,给定的数据文件格式如下: 533 16565 17800 2929 11374 9826 6852 20679 18224 21222 8227 5336 912 29525 3382 2100 10673 12284 31634 27405 1

Hadoop读书笔记(十)MapReduce中的从计数器理解combiner归约

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.combiner 问:什么是combiner: 答:Combiner发生在Mapper端,对数据进行归约处理,使传到reducer端的数据变小了,传输时间变端,作业时间变短,Combiner不能夸Mapper执行,(只有reduce可以接受多个Mapper的任务). 并不是所有的算法都适合归约处理,例如求平均数 2.代码实现 WordCount.j

Hadoop读书笔记(十一)MapReduce中的partition分组

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.partition分组 partition是指定分组算法,以及通过setNumReduceTasks设定Reduce的任务个数 2.代码 KpiApp.ava package cmd; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; i

Hadoop读书笔记(九)MapReduce计数器

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.MapReduce 计数器的作用 统计Map.Reduce以及Combiner执行的次数,可以用户简单判断代码的执行流程 2.MapReduce自带的计数器 14/11/26 22:28:51 INFO mapred.JobClient: Counters: 19 14/11/26 22:28:51 INFO mapred.JobClient: F

Hadoop读书笔记(十二)MapReduce自定义排序

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.说明: 对给出的两列数据首先按照第一列升序排列,当第一列相同时,第二列升序排列 数据格式: 3 3 3 2 3 1 2 2 2 1 1 1 2.代码 SortApp.java package sort; import java.io.DataInput; import java.io.DataOutput; import java.io.IOExc

Hadoop读书笔记(十三)MapReduce中Top算法

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.说明: 从给定的文件中的找到最大值 2.代码: TopApp.java package suanfa; import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.F

Hadoop读书笔记(五)MapReduce统计单词demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955