Hadoop读书笔记(九)MapReduce计数器

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855

1.MapReduce 计数器的作用

统计Map、Reduce以及Combiner执行的次数,可以用户简单判断代码的执行流程

2.MapReduce自带的计数器

14/11/26 22:28:51 INFO mapred.JobClient: Counters: 19
14/11/26 22:28:51 INFO mapred.JobClient:   File Output Format Counters
14/11/26 22:28:51 INFO mapred.JobClient:     Bytes Written=25
14/11/26 22:28:51 INFO mapred.JobClient:   FileSystemCounters
14/11/26 22:28:51 INFO mapred.JobClient:     FILE_BYTES_READ=343
14/11/26 22:28:51 INFO mapred.JobClient:     HDFS_BYTES_READ=42
14/11/26 22:28:51 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=128056
14/11/26 22:28:51 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25
14/11/26 22:28:51 INFO mapred.JobClient:   File Input Format Counters
14/11/26 22:28:51 INFO mapred.JobClient:     Bytes Read=21
14/11/26 22:28:51 INFO mapred.JobClient:   Map-Reduce Framework
14/11/26 22:28:51 INFO mapred.JobClient:     Map output materialized bytes=47
14/11/26 22:28:51 INFO mapred.JobClient:     Map input records=2
14/11/26 22:28:51 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/11/26 22:28:51 INFO mapred.JobClient:     Spilled Records=4
14/11/26 22:28:51 INFO mapred.JobClient:     Map output bytes=37
14/11/26 22:28:51 INFO mapred.JobClient:     Total committed heap usage (bytes)=366034944
14/11/26 22:28:51 INFO mapred.JobClient:     SPLIT_RAW_BYTES=97
14/11/26 22:28:51 INFO mapred.JobClient:     Combine input records=0
14/11/26 22:28:51 INFO mapred.JobClient:     Reduce input records=2
14/11/26 22:28:51 INFO mapred.JobClient:     Reduce input groups=2
14/11/26 22:28:51 INFO mapred.JobClient:     Combine output records=0
14/11/26 22:28:51 INFO mapred.JobClient:     Reduce output records=2
14/11/26 22:28:51 INFO mapred.JobClient:     Map output records=2

3.自定义计数器

package counter;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
/**
 *
 * <p>
 * Title: WordCount.java
 * Package counter
 * </p>
 * <p>
 * Description: 自定义计数器
 * <p>
 * @author Tom.Cai
 * @created 2014-11-26 下午10:47:32
 * @version V1.0
 *
 */
public class WordCount {
	private static final String INPUT_PATH = "hdfs://192.168.80.100:9000/hello";
	private static final String OUT_PATH = "hdfs://192.168.80.100:9000/out";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
		Path outPath = new Path(OUT_PATH);
		if (fileSystem.exists(outPath)) {
			fileSystem.delete(outPath, true);
		}
		Job job = new Job(conf, WordCount.class.getSimpleName());
		FileInputFormat.setInputPaths(job, INPUT_PATH);
		job.setInputFormatClass(TextInputFormat.class);

		job.setMapperClass(MyMapper.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(LongWritable.class);

		job.setPartitionerClass(HashPartitioner.class);
		job.setNumReduceTasks(1);

		job.setReducerClass(MyReducer.class);

		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(LongWritable.class);

		FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));
		job.setOutputFormatClass(TextOutputFormat.class);

		job.waitForCompletion(true);
	}

	static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
		@Override
		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
		<span style="color:#ff0000;">	/**
			 * 计数器的使用
			 */
			Counter mycounter = context.getCounter("MyCounter", "hello");
			if (value.toString().contains("hello")) {
				mycounter.increment(1L);
			}</span>
			String[] splited = value.toString().split("\t");
			for (String word : splited) {
				context.write(new Text(word), new LongWritable(1));
			}
		}
	}

	static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
		@Override
		protected void reduce(Text key, Iterable<LongWritable> value, Context context) throws IOException, InterruptedException {
			long count = 0L;
			for (LongWritable times : value) {
				count += times.get();
			}
			context.write(key, new LongWritable(count));
		}

	}

}

3.自定义计数器后输出

4/11/26 22:45:38 INFO mapred.JobClient: Counters: 20

14/11/26 22:45:38 INFO mapred.JobClient:   File Output Format Counters

14/11/26 22:45:38 INFO mapred.JobClient:     Bytes Written=25

14/11/26 22:45:38 INFO mapred.JobClient:   MyCounter

14/11/26 22:45:38 INFO mapred.JobClient:     hello=2

14/11/26 22:45:38 INFO mapred.JobClient:   FileSystemCounters

14/11/26 22:45:38 INFO mapred.JobClient:     FILE_BYTES_READ=343

14/11/26 22:45:38 INFO mapred.JobClient:     HDFS_BYTES_READ=42

14/11/26 22:45:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=128036

14/11/26 22:45:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25

14/11/26 22:45:38 INFO mapred.JobClient:   File Input Format Counters

14/11/26 22:45:38 INFO mapred.JobClient:     Bytes Read=21

14/11/26 22:45:38 INFO mapred.JobClient:   Map-Reduce Framework

14/11/26 22:45:38 INFO mapred.JobClient:     Map output materialized bytes=47

14/11/26 22:45:38 INFO mapred.JobClient:     Map input records=2

14/11/26 22:45:38 INFO mapred.JobClient:     Reduce shuffle bytes=0

14/11/26 22:45:38 INFO mapred.JobClient:     Spilled Records=4

14/11/26 22:45:38 INFO mapred.JobClient:     Map output bytes=37

14/11/26 22:45:38 INFO mapred.JobClient:     Total committed heap usage (bytes)=366034944

14/11/26 22:45:38 INFO mapred.JobClient:     SPLIT_RAW_BYTES=97

14/11/26 22:45:38 INFO mapred.JobClient:     Combine input records=0

14/11/26 22:45:38 INFO mapred.JobClient:     Reduce input records=2

14/11/26 22:45:38 INFO mapred.JobClient:     Reduce input groups=2

14/11/26 22:45:38 INFO mapred.JobClient:     Combine output records=0

14/11/26 22:45:38 INFO mapred.JobClient:     Reduce output records=2

14/11/26 22:45:38 INFO mapred.JobClient:     Map output records=2

欢迎大家一起讨论学习!

有用的自己收!

记录与分享,让你我共成长!欢迎查看我的其他博客;我的博客地址:http://blog.csdn.net/caicongyang

时间: 2024-11-04 13:26:48

Hadoop读书笔记(九)MapReduce计数器的相关文章

Hadoop读书笔记(十)MapReduce中的从计数器理解combiner归约

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.combiner 问:什么是combiner: 答:Combiner发生在Mapper端,对数据进行归约处理,使传到reducer端的数据变小了,传输时间变端,作业时间变短,Combiner不能夸Mapper执行,(只有reduce可以接受多个Mapper的任务). 并不是所有的算法都适合归约处理,例如求平均数 2.代码实现 WordCount.j

Hadoop读书笔记(八)MapReduce 打成jar包demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(六)MapReduce自定义数据类型demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(十四)MapReduce中TopK算法(Top100算法)

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 (系列文章会逐步修整完成,添加数据文件格式预计相关注释) 1.说明: 从给定的文件中的找到最大的100个值,给定的数据文件格式如下: 533 16565 17800 2929 11374 9826 6852 20679 18224 21222 8227 5336 912 29525 3382 2100 10673 12284 31634 27405 1

Hadoop读书笔记(十一)MapReduce中的partition分组

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.partition分组 partition是指定分组算法,以及通过setNumReduceTasks设定Reduce的任务个数 2.代码 KpiApp.ava package cmd; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; i

Hadoop读书笔记(七)MapReduce 0.x版本API使用demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(五)MapReduce统计单词demo

Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629 Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927 Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955

Hadoop读书笔记(十二)MapReduce自定义排序

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.说明: 对给出的两列数据首先按照第一列升序排列,当第一列相同时,第二列升序排列 数据格式: 3 3 3 2 3 1 2 2 2 1 1 1 2.代码 SortApp.java package sort; import java.io.DataInput; import java.io.DataOutput; import java.io.IOExc

Hadoop读书笔记(十三)MapReduce中Top算法

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.说明: 从给定的文件中的找到最大值 2.代码: TopApp.java package suanfa; import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.F