hadoop学习-wordcount程序c++重写执行

1、程序执行命令:

hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /input/wordcount/sample.txt -output /output/wordcount -program /bin/wordcount

2、具体代码:

#include <algorithm>
#include <stdint.h>
#include <string>
#include <vector>
#include "Pipes.hh"
#include "TemplateFactory.hh"
#include "StringUtils.hh"
#include <iostream>
using namespace std;

class WordcountMapper : public HadoopPipes::Mapper
{
public:
	WordcountMapper(HadoopPipes::TaskContext& context);
	vector<string> split(const string& src, const string& separator);
  void map(HadoopPipes::MapContext& context);
};  

class WordcountReducer : public HadoopPipes::Reducer
{
	public:
	  WordcountReducer(HadoopPipes::TaskContext& context);
	  void reduce(HadoopPipes::ReduceContext& context);
};
#include "wordcount.h"

WordcountMapper::WordcountMapper(HadoopPipes::TaskContext& context)
{
}

void WordcountMapper::map(HadoopPipes::MapContext& context)
{
	int count = 1;
	string line = context.getInputValue();
	vector<string> wordVec = split(line, " ");
	for(unsigned i=0; i<wordVec.size(); i++)
	{
		context.emit(wordVec[i], HadoopUtils::toString(count));
	}
}  

vector<string> WordcountMapper::split(const string& src, const string& separator)
{
    vector<string> dest;
    string str = src;
    string substring;
    string::size_type start = 0, index = 0;
		while(index != string::npos)
		{
        index = str.find_first_of(separator,start);
        if (index != string::npos)
        {
            substring = str.substr(start,index-start);
            dest.push_back(substring);
            start = str.find_first_not_of(separator,index);
            if (start == string::npos) return dest;
        }
    }
    substring = str.substr(start);
    dest.push_back(substring);
    return dest;
}

WordcountReducer::WordcountReducer(HadoopPipes::TaskContext& context)
{
}
void WordcountReducer::reduce(HadoopPipes::ReduceContext& context)
{
	int wSum = 0;
  while (context.nextValue())
	{
    wSum = wSum + HadoopUtils::toInt(context.getInputValue()) ;
  }
  context.emit(context.getInputKey(), HadoopUtils::toString(wSum));
}
#include "wordcount.h"

int main(int argc, char *argv[])
{
  return HadoopPipes::runTask(HadoopPipes::TemplateFactory<WordcountMapper, WordcountReducer>());
}  

makefile程序:

.SUFFIXES:.h .c .cpp .o

CC=g++
CPPFLAGS = -m64
RM = rm
SRCS = wordcount.cpp main.cpp
PROGRAM = wordcount
OBJS=$(SRCS:.cpp=.o)

INC_PATH = -I$(HADOOP_DEV_HOME)/include
LIB_PATH = -L$(HADOOP_DEV_HOME)/lib/native
LIBS = -lhadooppipes -lcrypto -lhadooputils -lpthread

#$?表示依赖项 [email protected]表示目的项
$(PROGRAM):$(OBJS)
	$(CC) $? -Wall $(LIB_PATH) $(LIBS)  -g -O2 -o [email protected]

$(OBJS):$(SRCS)
	$(CC) $(CPPFLAGS) -c $(SRCS)  $(INC_PATH)

.PHONY:clean
clean:
	$(RM) $(PROGRAM) $(OBJS)

源数据:

Happiness is not about being immortal nor having food or rights in one&apos;s hand. It??s about having each tiny wish come true, or having something to eat when you are hungry or having someone&apos;s love when you need love

Happiness is not about being immortal nor having food or rights in one&apos;s hand. It??s about having each tiny wish come true, or having something to eat when you are hungry or having someone&apos;s love when you need love

Happiness is not about being immortal nor having food or rights in one&apos;s hand. It??s about having each tiny wish come true, or having something to eat when you are hungry or having someone&apos;s love when you need love

执行结果:

Happiness 3

It??s 3

about 6

are 3

being 3

come 3

each 3

eat 3

food 3

hand. 3

having 12

hungry 3

immortal 3

in 3

is 3

love 6

need 3

nor 3

not 3

one&apos;s 3

or 9

rights 3

someone&apos;s 3

something 3

tiny 3

to 3

true, 3

when 6

wish 3

you 6

时间: 2024-08-13 09:10:40

hadoop学习-wordcount程序c++重写执行的相关文章

在ubuntu上安装eclipse同时连接hadoop运行wordcount程序

起先我是在win7 64位上远程连接hadoop运行wordcount程序的,但是这总是需要网络,考虑到这一情况,我决定将这个环境转移到unbuntu上 需要准备的东西 一个hadoop的jar包,一个连接eclipse的插件(在解压的jar包里有这个东西),一个hadoop-core-*.jar(考虑到连接的权限问题) 一个eclipse的.tar.gz包(其它类型的包也可以,eclipse本身就是不需要安装的,这里就不多说了) 因为我之前在win7上搭建过这个环境,所以一切很顺利,但还是要在

Hadoop下WordCount程序

一.前言 在之前我们已经在 CenOS6.5 下搭建好了 Hadoop2.x 的开发环境.既然环境已经搭建好了,那么现在我们就应该来干点正事嘛!比如来一个Hadoop世界的HelloWorld,也就是WordCount程序(一个简单的单词计数程序). 二.WordCount 官方案例的运行 2.1 程序简介 WordCount程序是hadoop自带的案例,我们可以在 hadoop 解压目录下找到包含这个程序的 jar 文件(hadoop-mapreduce-examples-2.7.1.jar)

运行hadoop的Wordcount程序报错java.lang.ClassNotFoundException: WordCount$TokenizerMapper

在运行hadoop的官方Wordcount程序时报错 java.lang.ClassNotFoundException: WordCount$TokenizerMapper 提示信息为找不到TokenizerMapper类,但程序师官方的,应该没错. 打包到Linux上可以运行,确定不是程序的错. 然后在网上搜索一番,看到有人说可能是eclipse版本原因,试了一下就ok了 使用的eclipse版本是3.5.1. 遇到此问题的兄弟们可以试一下

hadoop学习WordCount+Block+Split+Shuffle+Map+Reduce技术详解

转自:http://blog.csdn.net/yczws1/article/details/21899007 纯干货:通过WourdCount程序示例:详细讲解MapReduce之Block+Split+Shuffle+Map+Reduce的区别及数据处理流程. Shuffle过程是MapReduce的核心,集中了MR过程最关键的部分.要想了解MR,Shuffle是必须要理解的.了解Shuffle的过程,更有利于我们在对MapReduce job性能调优的工作有帮助,以及进一步加深我们对MR内

Hadoop 学习笔记四--JobTracker 的执行过程

Hadoop中MapReduce 的执行也是采用Master/Slave 主从结构的方式.其中JobTracker 充当了Master的角色,而TaskTracker 充当了Slave 的角色.Master负责接受客户端提交的Job,然后调度Job的每一个子任务Task运行于Slave上,并监控它们.如果发现所有失败的Task就重新运行它,slave则负责直接执行每一个Task. 当Hadoop启动的时候,JobTracker 是作为单独的一个JVM来运行的.JobTracker 会一直等待Jo

Hadoop 学习笔记三 --JobClient 的执行过程

一. MapReduce 作业处理过程概述 当用户在使用Hadoop 的 MapReduce 计算模型处理问题的时候,只需要设计好Mapper 和Reducer 处理函数,还有可能包括Combiner 函数.之后,新建一个Job 对象,并对Job 的运行环境进行一些配置,最后调用Job 的waitForCompletion 或者 submit 方法来提交作业即可.代码如下: 1 //新建默认的Job 配置对象 2 Configuration conf = new Configuration();

[hadoop]命令行编译并运行hadoop例子WordCount

首先保证JDK.Hadoop安装设置成功 可以参考[linux]ubuntu下安装hadoop [linux]ubutnu12.04 下安装jdk1.7 使用hadoop版本为1.2.1,jdk为1.7 在hadoop-1.2.1\src\examples\org\apache\hadoop\examples找到WordCount.java 源码如下: 1 /** 2 * Licensed under the Apache License, Version 2.0 (the "License&q

Hadoop学习笔记(1):WordCount程序的实现与总结

开篇语: 这几天开始学习Hadoop,花费了整整一天终于把伪分布式给搭好了,激动之情无法言表······ 搭好环境之后,按着书本的代码,实现了这个被誉为Hadoop中的HelloWorld的程序--WordCount,以此开启学习Hadoop的篇章. 本篇旨在总结WordCount程序的基本结构和工作原理,有关环境的搭建这块,网上有很多的教程,大家可以自行找谷歌或百度. 何为MapReduce: 在开始看WordCount的代码之前,先简要了解下什么是MapReduce.HDFS和MapRedu

一步一步跟我学习hadoop(2)----hadoop eclipse插件安装和运行wordcount程序

本博客hadoop版本是hadoop  0.20.2. 安装hadoop-0.20.2-eclipse-plugin.jar 下载hadoop-0.20.2-eclipse-plugin.jar文件,并添加到eclipse插件库,添加方法很简单:找到eclipse安装目录下的plugins目录,直接复制到此目录下,重启eclipse 依次点击eclipse工具栏的window-----show view ------other在弹出的窗口中输入map,确认找到如下所示 到这里插件安装成功 map