大数据之---Yarn伪分布式部署和MapReduce案例

1、软件环境

RHEL6 角色 jdk-8u45
hadoop-2.8.1.tar.gz ? ssh
xx.xx.xx.xx ip地址 NN hadoop01
xx.xx.xx.xx ip地址 DN hadoop02
xx.xx.xx.xx ip地址 DN hadoop03
xx.xx.xx.xx ip地址 DN hadoop04
xx.xx.xx.xx ip地址 DN hadoop05

本次涉及伪分布式部署只是要主机hadoop01,软件安装参考伪分布式部署终极篇

2、配置yarn和mapreduce


?

[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml

配置yarn
[[email protected] hadoop]$ vi mapred-site.xml
<configuration>
? ? <property>
??????? <name>mapreduce.framework.name</name>
??????? <value>yarn</value>
??? </property>
</configuration>

配置mapreduce
[[email protected] hadoop]$ vi yarn-site.xml:
<configuration>
??? <property>
??????? <name>yarn.nodemanager.aux-services</name>
??????? <value>mapreduce_shuffle</value>
??? </property>
</configuration>

?

?

?

3、提交测试jar计算圆周率

job_1524804813835_0001 job命名格式: job_unix时间_数字


[[email protected] sbin]$ ./start-yarn.sh

[[email protected] hadoop]$ find ./* -name *examples*
./lib/native/examples
./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.1-sources.jar
./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.1-test-sources.jar
./share/hadoop/mapreduce/lib-examples
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar
./share/doc/hadoop/hadoop-auth-examples
./share/doc/hadoop/hadoop-mapreduce-examples
./share/doc/hadoop/api/org/apache/hadoop/examples
./share/doc/hadoop/api/org/apache/hadoop/security/authentication/examples
[[email protected] hadoop]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar pi 5 10
Number of Maps? = 5
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Starting Job
18/04/27 12:58:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/04/27 12:58:50 INFO input.FileInputFormat: Total input files to process : 5
18/04/27 12:58:50 INFO mapreduce.JobSubmitter: number of splits:5
18/04/27 12:58:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524804813835_0001
18/04/27 12:58:51 INFO impl.YarnClientImpl: Submitted application application_1524804813835_0001
18/04/27 12:58:51 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1524804813835_0001/
18/04/27 12:58:51 INFO mapreduce.Job: Running job: job_1524804813835_0001
18/04/27 12:59:03 INFO mapreduce.Job: Job job_1524804813835_0001 running in uber mode : false
18/04/27 12:59:03 INFO mapreduce.Job:? map 0% reduce 0%
18/04/27 12:59:18 INFO mapreduce.Job:? map 100% reduce 0%
18/04/27 12:59:25 INFO mapreduce.Job:? map 100% reduce 100%
18/04/27 12:59:26 INFO mapreduce.Job: Job job_1524804813835_0001 completed successfully
18/04/27 12:59:27 INFO mapreduce.Job: Counters: 49
??? File System Counters
??????? FILE: Number of bytes read=116
??????? FILE: Number of bytes written=819783
??????? FILE: Number of read operations=0
??????? FILE: Number of large read operations=0
??????? FILE: Number of write operations=0
??????? HDFS: Number of bytes read=1350
??????? HDFS: Number of bytes written=215
??????? HDFS: Number of read operations=23
??????? HDFS: Number of large read operations=0
??????? HDFS: Number of write operations=3
??? Job Counters
??????? Launched map tasks=5
??????? Launched reduce tasks=1
??????? Data-local map tasks=5
??????? Total time spent by all maps in occupied slots (ms)=64938
??????? Total time spent by all reduces in occupied slots (ms)=4704
??????? Total time spent by all map tasks (ms)=64938
??????? Total time spent by all reduce tasks (ms)=4704
??????? Total vcore-milliseconds taken by all map tasks=64938
??????? Total vcore-milliseconds taken by all reduce tasks=4704
??????? Total megabyte-milliseconds taken by all map tasks=66496512
??????? Total megabyte-milliseconds taken by all reduce tasks=4816896
??? Map-Reduce Framework
??????? Map input records=5
??????? Map output records=10
??????? Map output bytes=90
??????? Map output materialized bytes=140
??????? Input split bytes=760
??????? Combine input records=0
??????? Combine output records=0
??????? Reduce input groups=2
??????? Reduce shuffle bytes=140
??????? Reduce input records=10
??????? Reduce output records=0
??????? Spilled Records=20
??????? Shuffled Maps =5
??????? Failed Shuffles=0
??????? Merged Map outputs=5
??????? GC time elapsed (ms)=1428
??????? CPU time spent (ms)=5740
??????? Physical memory (bytes) snapshot=1536856064
??????? Virtual memory (bytes) snapshot=12578734080
??????? Total committed heap usage (bytes)=1152385024
??? Shuffle Errors
??????? BAD_ID=0
??????? CONNECTION=0
??????? IO_ERROR=0
??????? WRONG_LENGTH=0
??????? WRONG_MAP=0
??????? WRONG_REDUCE=0
??? File Input Format Counters
??????? Bytes Read=590
??? File Output Format Counters
??????? Bytes Written=97
Job Finished in 37.717 seconds
Estimated value of Pi is 3.28000000000000000000
[[email protected] hadoop]$

原文地址:http://blog.51cto.com/chaorenyong/2117484

时间: 2024-10-21 21:42:41

大数据之---Yarn伪分布式部署和MapReduce案例的相关文章

大数据之---hadoop伪分布式部署(HDFS)全网终极篇

1.软件环境RHEL6 jdk-8u45 hadoop-2.8.1.tar.gz ssh xx.xx.xx.xx ip地址 hadoop1 xx.xx.xx.xx ip地址 hadoop2 xx.xx.xx.xx ip地址 hadoop3 xx.xx.xx.xx ip地址 hadoop4 xx.xx.xx.xx ip地址 hadoop5 本次部署只涉及伪分布式部署只是要主机hadoop1 2.伪分布式部署伪分布式部署文档参考官方网站hadoopApache > Hadoop > Apache

大数据之伪分布式部署之终极篇

------------------------------软件版本-------------------------------------- RHEL6.8 hadoop2.8.1 apache-maven-3.3.9 ? ? ? findbugs-1.3.9 protobuf-2.5.0.tar.gz jdk-8u45 ------------------------------软件版本--------------------------------------- 1.Hadoop宏观:

hadoop学习系列(1.大数据典型特性与分布式开发难点)

第一天 1.大数据典型特性与分布式开发难点 1. 大数据典型特性与分布式开发难点 2. Hadoop框架介绍与搜索技术体系介绍 3. Hadoop版本与特性介绍 4. Hadoop核心模块之HDFS分布式文件系统架构介绍 5. Hadoop核心模块之Yarn操作系统架构介绍 6. Linux安全禁用设置与JDK安装讲解 7. Hadoop伪分布式环境部署HDFS部分 8. Hadoop伪分布式环境部署Yarn和MR部分 9. Hadoop环境使用常见的错误集合 10. Hadoop环境常规设置与

Hadoop1 Centos伪分布式部署

前言:       毕业两年了,之前的工作一直没有接触过大数据的东西,对hadoop等比较陌生,所以最近开始学习了.对于我这样第一次学的人,过程还是充满了很多疑惑和不解的,不过我采取的策略是还是先让环境跑起来,然后在能用的基础上在多想想为什么.       通过这三个礼拜(基本上就是周六周日,其他时间都在加班啊T T)的探索,我目前主要完成的是: 1.在Linux环境中伪分布式部署hadoop(SSH免登陆),运行WordCount实例成功.       2.自己打包hadoop在eclipse

大数据平台Hadoop的分布式集群环境搭建

1 概述 本文章介绍大数据平台Hadoop的分布式环境搭建.以下为Hadoop节点的部署图,将NameNode部署在master1,SecondaryNameNode部署在master2,slave1.slave2.slave3中分别部署一个DataNode节点 NN=NameNode(名称节点) SND=SecondaryNameNode(NameNode的辅助节点) DN=DataNode(数据节点)2 前期准备 (1)准备五台服务器 如:master1.master2.slave1.sla

探析大数据需求下的分布式数据库

一.前言 大数据技术从诞生到现在,已经经历了十几个年头.市场上早已不断有公司或机构,给广大金融从业者"洗脑"大数据未来的美好前景与趋势.随着用户对大数据理念与技术的不断深入了解,人们已经开始从理论探索转向对场景落地的寻找,让大数据在企业中落地并开花结果. 从大数据的管理和应用方向集中在两个领域.第一,大数据分析相关,针对海量数据的挖掘.复杂的分析计算:第二,在线数据操作,包括传统交易型操作以及海量数据的实时访问.大数据高并发查询操作.用户根据业务场景以及对数据处理结果的期望选择不同的大

王家林的云计算分布式大数据Hadoop征服之旅:HDFS&amp;MapReduce&amp;HBase&amp;Hive&amp;集群管理

一:课程简介: 作为云计算实现规范和实施标准的Hadoop恰逢其时的应运而生,使用Hadoop用户可以在不了解分布式底层细节的情况下开发出分布式程序,从而可以使用众多廉价的计算设备的集群的威力来高速的运算和存储,而且Hadoop的运算和存储是可靠的.高效,的.可伸缩的,能够使用普通的社区服务器出来PB级别的数据,是分布式大数据处理的存储的理想选择. 本课程会助你深入浅出的掌握Hadoop开发(包括HDFS.MapReduce.HBase.Hive等),并且在此基础上掌握Hadoop集群的配置.维

ActiveMQ伪分布式部署

本文借鉴http://www.cnblogs.com/guozhen/p/5984915.html,在此基础上进行了完善,使之成为一个完整版的伪分布式部署说明,在此记录一下! 一.本文目的 介绍如何在同一台虚拟机上搭建高可用的Activemq服务,集群数量包含3个Activemq,当Activemq可用数>=2时,整个集群可用. 本文Activemq的集群数量为3个,分别命名为mq1,mq2,mq3   二.概念介绍 1.伪集群 集群搭建在同一台虚拟机上,3个Activemq分别使用不同的端口提

hadoop2.5.1伪分布式部署

可参阅官方文档 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html 文件下载: http://mirrors.hust.edu.cn/apache/hadoop/common/ 注:可直接使用2.5.2或2.6.0版本   都解决了2.5.1中不稳定bug 1       伪分布式部署 以测试通过的为例 Linux version 2.6.32-431.el6.x86