hadoop的WordCount样例

package cn.lmj.mapreduce;

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

import org.apache.hadoop.mapred.TextInputFormat;

import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount

{

//mapper

public static class WordCountMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,LongWritable>

{

LongWritable count = new LongWritable(1);

Text content = new Text();

@Override

public void map(LongWritable key, Text value,

OutputCollector<Text, LongWritable> output, Reporter report)

throws IOException

{

//切割字符串

String str = value.toString();

String[] arr = str.split(" ");

for(String s : arr)

{

content.set(s);

output.collect(content,count);

}

//reducer

public static class WordCountReduce extends MapReduceBase implements Reducer<Text,LongWritable,Text,LongWritable>

{

@Override

public void reduce(Text key, Iterator<LongWritable> values,

OutputCollector<Text, LongWritable> output, Reporter rep)

throws IOException

{

//将同样key的value累加

long sum = 0;

while(values.hasNext())

{

sum+=values.next().get();

}

output.collect(key,new LongWritable(sum));

}

public static void main(String[] args) throws Exception

{

//创建一个JobConf

JobConf conf = new JobConf(WordCount2.class);

conf.setJobName("lmj");

//设置输出类型

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(LongWritable.class);

//设置Map、Combine和Reduce处理类

conf.setMapperClass(WordCountMapper.class);

conf.setCombinerClass(WordCountReduce.class);

conf.setReducerClass(WordCountReduce.class);

//设置输入类型

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

//设置输入和输出文件夹

FileInputFormat.setInputPaths(conf,new Path("/aaa/hadoop.txt"));

FileOutputFormat.setOutputPath(conf,new Path("/aaa/output"));

//启动jobConf

JobClient.runJob(conf);

}

hadoop的WordCount样例

时间： 2024-10-04 09:43:23

hadoop的WordCount样例的相关文章

Hadoop辅助排序样例一

1. 样例数据 011990-99999 SIHCCAJAVRI 012650-99999 TYNSET-HANSMOEN 012650-99999 194903241200 111 012650-99999 194903241800 78 011990-99999 195005150700 0 011990-99999 195005151200 22 011990-99999 195005151800 -11 2. 需求 3. 思路.代码将气象站ID相同的气象站信息和天气信息交由同一个 Re

Hadoop辅助排序样例二

1. 需求求每年的最高温度 2. 样例数据 1995 10 1996 11 1995 16 1995 22 1996 26 1995 3 1996 7 1996 10 1996 20 1996 33 1995 21 1996 9 1995 31 1995 -13 1995 22 1997 -2 1997 28 1997 15 1995 8 3. 思路.代码将记录按年份分组并按温度降序排序,然后才将同一年份的所有记录送到一个 reducer 组,则各组的首条记录就是这一年的最高温度.实现此方案

[hadoop系列]Pig的安装和简单演示样例

inkfish原创,请勿商业性质转载,转载请注明来源(http://blog.csdn.net/inkfish ).(来源:http://blog.csdn.net/inkfish) Pig是Yahoo!捐献给Apache的一个项目,眼下还在Apache孵化器(incubator)阶段,眼下版本号是v0.5.0.Pig是一个基于Hadoop的大规模数据分析平台,它提供的SQL-like语言叫Pig Latin,该语言的编译器会把类SQL的数据分析请求转换为一系列经过优化处理的MapReduce运

Hadoop AWS Word Count 样例

在AWS里用Elastic Map Reduce 开一个Cluster 然后登陆master node并编译下面程序: import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import o

第六篇：Eclipse上运行第一个Hadoop实例 - WordCount(单词统计程序)

需求计算出文件中每个单词的频数.要求输出结果按照单词的字母顺序进行排序.每个单词和其频数占一行,单词和频数之间有间隔. 比如,输入两个文件,其一内容如下: hello world hello hadoop hello mapreduce 另一内容如下: bye world bye hadoop bye mapreduce 对应上面给出的输入样例,其输出样例为: bye 3 hadoop 2 hello 3 mapreduce 2 world 2 方案制定对该案例,可设计出如下的MapRe

Eclipse上运行第一个Hadoop实例 - WordCount(单词统计程序)

需求计算出文件中每个单词的频数.要求输出结果按照单词的字母顺序进行排序.每个单词和其频数占一行,单词和频数之间有间隔. 比如,输入一个文件,其内容如下: hello world hello hadoop hello mapreduce 对应上面给出的输入样例,其输出样例为: hadoop 1 hello 3 mapreduce 1 world 1 方案制定对该案例,可设计出如下的MapReduce方案: 1. Map阶段各节点完成由输入数据到单词切分的工作 2. shuffle阶段完成相同单

Spark执行样例报警告：WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources

搭建Spark环境后,调测Spark样例时,出现下面的错误:WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources [[email protected] bin]$ ./run-example org.apache.sp

[0010] windows 下 eclipse 开发 hdfs程序样例 (二)

目的: 学习windows 开发hadoop程序的配置相关: [0007] windows 下 eclipse 开发 hdfs程序样例环境: 基于以下环境配置好后. [0008] Windows 7 下 hadoop 2.6.4 eclipse 本地开发调试配置 1. 新建HDFS下载文件类在已有mapreduce项目中新建类添加如下代码,代码从[0007]中取出小修改功能:从hdfs下载文件到windows本地 package hadoop.hdfs; import java.io.F

hadoop学习WordCount+Block+Split+Shuffle+Map+Reduce技术详解

转自:http://blog.csdn.net/yczws1/article/details/21899007 纯干货:通过WourdCount程序示例:详细讲解MapReduce之Block+Split+Shuffle+Map+Reduce的区别及数据处理流程. Shuffle过程是MapReduce的核心,集中了MR过程最关键的部分.要想了解MR,Shuffle是必须要理解的.了解Shuffle的过程,更有利于我们在对MapReduce job性能调优的工作有帮助,以及进一步加深我们对MR内