[hadoop] hadoop 运行 wordcount

讲准备好的文本文件放到hdfs中

执行 hadoop 安装包中的例子

[[email protected] mapreduce]# hadoop jar hadoop-mapreduce-examples-2.8.0.jar  wordcount /input/ /output/wordcount
17/05/14 02:01:17 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.253.128:8032
17/05/14 02:01:19 INFO input.FileInputFormat: Total input files to process : 2
17/05/14 02:01:19 INFO mapreduce.JobSubmitter: number of splits:2
17/05/14 02:01:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494742494825_0002
17/05/14 02:01:20 INFO impl.YarnClientImpl: Submitted application application_1494742494825_0002
17/05/14 02:01:20 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1494742494825_0002/
17/05/14 02:01:20 INFO mapreduce.Job: Running job: job_1494742494825_0002
17/05/14 02:01:35 INFO mapreduce.Job: Job job_1494742494825_0002 running in uber mode : false
17/05/14 02:01:35 INFO mapreduce.Job:  map 0% reduce 0%
17/05/14 02:02:48 INFO mapreduce.Job:  map 100% reduce 0%
17/05/14 02:03:22 INFO mapreduce.Job:  map 100% reduce 100%
17/05/14 02:03:25 INFO mapreduce.Job: Job job_1494742494825_0002 completed successfully
17/05/14 02:03:28 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=350
                FILE: Number of bytes written=408885
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=584
                HDFS: Number of bytes written=145
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=145615
                Total time spent by all reduces in occupied slots (ms)=17369
                Total time spent by all map tasks (ms)=145615
                Total time spent by all reduce tasks (ms)=17369
                Total vcore-milliseconds taken by all map tasks=145615
                Total vcore-milliseconds taken by all reduce tasks=17369
                Total megabyte-milliseconds taken by all map tasks=149109760
                Total megabyte-milliseconds taken by all reduce tasks=17785856
        Map-Reduce Framework
                Map input records=14
                Map output records=70
                Map output bytes=666
                Map output materialized bytes=356
                Input split bytes=196
                Combine input records=70
                Combine output records=30
                Reduce input groups=19
                Reduce shuffle bytes=356
                Reduce input records=30
                Reduce output records=19
                Spilled Records=60
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=9667
                CPU time spent (ms)=3210
                Physical memory (bytes) snapshot=330969088
                Virtual memory (bytes) snapshot=6192197632
                Total committed heap usage (bytes)=259284992
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=388
        File Output Format Counters
                Bytes Written=145

  

查看执行结果:

时间: 2024-10-04 02:33:43

[hadoop] hadoop 运行 wordcount的相关文章

RedHat 安装Hadoop并运行wordcount例子

1.安装 Red Hat 环境 2.安装JDK 3.下载hadoop2.8.0 http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz 4.在用户目录下新建hadoop文件夹,并解压hadoop压缩包 mkdir Hadoop tar -zxvf hadoop-2.8.0.tar.gz 5.为hadoop配置JAVA_HOME [[email protected] ~]$

Hadoop3 在eclipse中访问hadoop并运行WordCount实例

前言:       毕业两年了,之前的工作一直没有接触过大数据的东西,对hadoop等比较陌生,所以最近开始学习了.对于我这样第一次学的人,过程还是充满了很多疑惑和不解的,不过我采取的策略是还是先让环境跑起来,然后在能用的基础上在多想想为什么.       通过这三个礼拜(基本上就是周六周日,其他时间都在加班啊T T)的探索,我目前主要完成的是: 1.在Linux环境中伪分布式部署hadoop(SSH免登陆),运行WordCount实例成功. http://www.cnblogs.com/Pur

eclipse集成使用Hadoop插件运行WordCount程序

云地址:https://pan.baidu.com/s/1CmBAJMdcwCxLGuCwSTcJNw 密码:qocw 前提条件:启动集群 ,配置好JDK和hadoop环境变量 有必要删除 虚拟机中的output文件 使用 [[email protected] sbin]# hadoop fs -rm -r /output Deleted /output [[email protected] sbin]# ll 给hadoop文件权限 [[email protected] sbin]# hado

hadoop yarn 运行wordcount时执行完成,但是返回错误

错误信息如下: 15/09/05 03:48:02 INFO mapreduce.Job: Job job_1441395011668_0001 failed with state FAILED due to: Application application_1441395011668_0001 failed 2 times due to AM Container for appattempt_1441395011668_0001_000002 exited with exitCode: 1 F

CentOS上安装Hadoop2.7,添加数据节点,运行wordcount

安装hadoop的步骤比较繁琐,但是并不难. 在CentOS上安装Hadoop2.7 1. 安装 CentOS,注:图形界面并无必要 2. 在CentOS里设置静态IP,手工编辑如下4个文件 /etc/hosts /etc/sysconfig/netwok /etc/hostname /etc/sysconfig/network-scripts/ifcfg-eno1677773 3. 关闭防火墙 Close firewalld systemctl stop firewalld.service #

在ubuntu上安装eclipse同时连接hadoop运行wordcount程序

起先我是在win7 64位上远程连接hadoop运行wordcount程序的,但是这总是需要网络,考虑到这一情况,我决定将这个环境转移到unbuntu上 需要准备的东西 一个hadoop的jar包,一个连接eclipse的插件(在解压的jar包里有这个东西),一个hadoop-core-*.jar(考虑到连接的权限问题) 一个eclipse的.tar.gz包(其它类型的包也可以,eclipse本身就是不需要安装的,这里就不多说了) 因为我之前在win7上搭建过这个环境,所以一切很顺利,但还是要在

运行第一个Hadoop程序,WordCount

系统: Ubuntu14.04 Hadoop版本: 2.7.2 参照http://www.cnblogs.com/taichu/p/5264185.html中的分享,来学习运行第一个hadoop程序. 在hadoop的安装文件夹 /usr/local/hadoop下创建input文件夹 [email protected]:/usr/local/hadoop$ mkdir ./input 然后copy几个文档到input文件夹中作为WordCount的输入 [email protected]:/u

(二)Hadoop例子——运行example中的wordCount例子

Hadoop例子——运行example中的wordCount例子 一.   需求说明 单词计数是最简单也是最能体现MapReduce思想的程序之一,可以称为 MapReduce版"Hello World",该程序的完整代码可以在Hadoop安装包的"src/examples"目录下找到.单词计数主要完成功能是:统计一系列文本文件中每个 单词出现的次数,如下图所示. 二.   环境 VMware® Workstation 10.04 Ubuntu14.04 32位 J

[Linux][Hadoop] 运行WordCount例子

紧接上篇,完成Hadoop的安装并跑起来之后,是该运行相关例子的时候了,而最简单最直接的例子就是HelloWorld式的WordCount例子.   参照博客进行运行:http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/   首先创建一个文件夹,并创建两个文件,目录随意,为以下文件结构: examples --file1.txt --file2.txt 文件内容随意填写,我是从新闻copy下来的一段英文: 执

hadoop 2.2.0 编译运行wordcount

hadoop2.2.0 编译运行wordcount,因为hadoop2.2.0不支持eclipse的插件,所以运行wordcount,需要手动编译并将wordcount打包成jar包来运行,下面记录一下编译运行的过程,希望能给大家有些帮助. 1.首先介绍下hadoop的版本问题,当前Hadoop版本比较混乱,让很多用户不知所措.实际上,当前Hadoop只有两个版本:Hadoop 1.0和Hadoop 2.0,其中,Hadoop 1.0由一个分布式文件系统HDFS和一个离线计算框架MapReduc