搭建Hadoop2.6.4伪分布式

准备工作

操作系统

CentOS 7

软件环境

  1. JDK 1.7.0_79 下载地址
  2. SSH,正常来说是系统自带的,若没有请自行搜索安装方法

关闭防火墙

systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动

设置HostName

[[email protected] ~]# hostname localhost

安装环境

安装JDK

[[email protected] ~]# tar -xzvf jdk-7u79-linux-x64.tar.gz

配置java环境变量

[[email protected] ~]# vi /etc/profile
#添加如下配置
JAVA_HOME=/root/jdk1.7.0_79
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JAVA_HOME
export PATH
export CLASSPATH

验证java

[[email protected] ~]# java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

待输出以上内容时说明java已安装配置成功。

安装Hadoop

下载Hadoop 2.6.4

安装Hadoop 2.6.4

[[email protected] ~]# tar -xzvf hadoop-2.6.4.tar.gz

配置Hadoop环境变量

[[email protected] ~]# vim /etc/profile
#添加以下配置
export HADOOP_HOME=/root/hadoop-2.6.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

[[email protected] ~]# vim /root/hadoop-2.6.4/etc/hadoop/hadoop-env.sh
#修改以下配置
# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.
export JAVA_HOME=/root/jdk1.7.0_79

验证Hadoop

[[email protected] ~]# hadoop version
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /root/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar

修改Hadoop配置文件

配置文件均存放在/root/hadoop-2.6.4/etc/hadoop

<!-- core-site.xml-->
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

<!-- hdfs-site.xml -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

<!-- mapred-site.xml -->
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

<!-- yarn-site.xml -->
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

SSH免密码登陆

[[email protected] ~]# ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa
[[email protected] ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

输入以下命令,如果不要求输入密码则表示配置成功:

[[email protected] ~]# ssh localhost
Last login: Fri May  6 05:17:32 2016 from 192.168.154.1

执行Hadoop

格式化hdfs

[[email protected] ~]# hdfs namenode -format

启动NameNode,DataNode和YARN

[[email protected] ~]# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /root/hadoop-2.6.4/logs/hadoop-root-namenode-localhost.out
localhost: starting datanode, logging to /root/hadoop-2.6.4/logs/hadoop-root-datanode-localhost.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop-2.6.4/logs/hadoop-root-secondarynamenode-localhost.out

[[email protected] ~]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /root/hadoop-2.6.4/logs/yarn-root-resourcemanager-localhost.out
localhost: starting nodemanager, logging to /root/hadoop-2.6.4/logs/yarn-root-nodemanager-localhost.out

向hdfs上传测试文件

首先在/root/test中建立test1.txt和test2.txt,分别输入“hello world”和“hello hadoop”并保存。

使用如下命令将文件上传至hdfs的input目录中:

[[email protected] ~]# hadoop fs -put /root/test/ input
[[email protected] ~]# hadoop fs -ls input
Found 2 items
-rw-r--r--   1 root supergroup         12 2016-05-06 06:35 input/test1.txt
-rw-r--r--   1 root supergroup         13 2016-05-06 06:35 input/test2.txt

执行wordcount demo

输入以下命令并等待执行结果:

[[email protected] ~]# hadoop jar /root/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount input output
16/05/06 06:44:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/06 06:44:16 INFO input.FileInputFormat: Total input paths to process : 2
16/05/06 06:44:17 INFO mapreduce.JobSubmitter: number of splits:2
16/05/06 06:44:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1462530786445_0001
16/05/06 06:44:18 INFO impl.YarnClientImpl: Submitted application application_1462530786445_0001
16/05/06 06:44:18 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1462530786445_0001/
16/05/06 06:44:18 INFO mapreduce.Job: Running job: job_1462530786445_0001
16/05/06 06:44:33 INFO mapreduce.Job: Job job_1462530786445_0001 running in uber mode : false
16/05/06 06:44:33 INFO mapreduce.Job:  map 0% reduce 0%
16/05/06 06:44:52 INFO mapreduce.Job:  map 50% reduce 0%
16/05/06 06:44:53 INFO mapreduce.Job:  map 100% reduce 0%
16/05/06 06:45:03 INFO mapreduce.Job:  map 100% reduce 100%
16/05/06 06:45:03 INFO mapreduce.Job: Job job_1462530786445_0001 completed successfully
16/05/06 06:45:04 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=55
                FILE: Number of bytes written=320242
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=249
                HDFS: Number of bytes written=25
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=34487
                Total time spent by all reduces in occupied slots (ms)=7744
                Total time spent by all map tasks (ms)=34487
                Total time spent by all reduce tasks (ms)=7744
                Total vcore-milliseconds taken by all map tasks=34487
                Total vcore-milliseconds taken by all reduce tasks=7744
                Total megabyte-milliseconds taken by all map tasks=35314688
                Total megabyte-milliseconds taken by all reduce tasks=7929856
        Map-Reduce Framework
                Map input records=2
                Map output records=4
                Map output bytes=41
                Map output materialized bytes=61
                Input split bytes=224
                Combine input records=4
                Combine output records=4
                Reduce input groups=3
                Reduce shuffle bytes=61
                Reduce input records=4
                Reduce output records=3
                Spilled Records=8
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=364
                CPU time spent (ms)=3990
                Physical memory (bytes) snapshot=515538944
                Virtual memory (bytes) snapshot=2588155904
                Total committed heap usage (bytes)=296755200
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=25
        File Output Format Counters
                Bytes Written=25

查看执行结果

[[email protected] ~]# hadoop fs -ls output
Found 2 items
-rw-r--r--   1 root supergroup          0 2016-05-06 06:45 output/_SUCCESS
-rw-r--r--   1 root supergroup         25 2016-05-06 06:45 output/part-r-00000
[[email protected] ~]# hadoop fs -cat output/part-r-00000
hadoop  1
hello   2
world   1

至此,Pseudo-Distributed就已经完成了。

完全分布式可参考这里

时间: 2024-10-12 21:46:24

搭建Hadoop2.6.4伪分布式的相关文章

在Win7虚拟机下搭建Hadoop2.6.0伪分布式环境

近几年大数据越来越火热.由于工作需要以及个人兴趣,最近开始学习大数据相关技术.学习过程中的一些经验教训希望能通过博文沉淀下来,与网友分享讨论,作为个人备忘. 第一篇,在win7虚拟机下搭建hadoop2.6.0伪分布式环境. 1. 所需要的软件 使用Vmware 11.0搭建虚拟机,安装Ubuntu 14.04.2系统. Jdk 1.7.0_80 Hadoop 2.6.0 2. 安装vmware和ubuntu 略 3. 在ubuntu中安装JDK 将jdk解压缩到目录:/home/vm/tool

Java笔记--CenOS6.5搭建hadoop2.7.1伪分布式环境

一.前言 很以前就搭建过hadoop的伪分布式环境,为了搭建环境特意弄的双系统,还把毕业论文给毁了.不过当时使用的是 hadoop1.x 的,而且因为一些原因,就搭建了环境,而没继续学习了.现在开始,准备好好的学习一下hadoop 二.Hadoop 简介 Hadoop 是Apache软件基金会旗下的一个开源分布式计算平台 是云计算中 PaaS(平台即服务)一层的实现 HDFS 和 MapReduce 共同组成了Hadoop分布式系统体系结构的核心 注:hadoop 具体介绍,留待以后说,现在主要

CentOS5.4 搭建Hadoop2.5.2伪分布式环境

简介: Hadoop是处理大数据的主要工具,其核心部分是HDFS.MapReduce.为了学习的方便,我在虚拟机上搭建了一个伪分布式环境,来进行开发学习. 一.安装前准备: 1)linux服务器:Vmware 上CentOS6.4 mini安装 2) JDK:jdk-7u65-linux-x64.gz 3) SSH:ssh client 4) YUM源配置妥当:yum list查看 5)Hadoop:hadoop-2.5.2.tar.gz 二.环境配置 1)linux环境基本设置: vi /et

CentOS6.6搭建Hadoop2.5.2伪分布式环境

Hadoop是用作处理大数据用的,核心是HDFS.Map/Reduce.虽然目前工作中不需要使用这个,但是,技多不压身,经过虚拟机很多遍的尝试,终于将Hadoop2.5.2的环境顺利搭建起来了. 首先准备一个CentOS,将主机名改为master,并且在/etc/hosts里面加入master对应的本机ip地址. Linux基本配置 vi /etc/sysconfig/network #编辑文件里面的HOSTNAME=master vi /etc/hosts #添加 本机IP地址   maste

Dockerfile完成Hadoop2.6的伪分布式搭建

在 <Docker中搭建Hadoop-2.6单机伪分布式集群>中在容器中操作来搭建伪分布式的Hadoop集群,这一节中将主要通过Dokcerfile 来完成这项工作. 1 获取一个简单的Docker系统镜像,并建立一个容器. 1.1 这里我选择下载CentOS镜像 docker pull centos 1.2 通过docker tag命令将下载的CentOS镜像名称换成centos,并删除老标签 docker tag docker.io/centos centosdocker rmr dock

32位Ubuntu12.04搭建Hadoop2.5.1完全分布式环境

准备工作 1.准备安装环境: 4台PC,均安装32位Ubuntu12.04操作系统,统一用户名和密码 交换机1台 网线5根,4根分别用于PC与交换机相连,1根网线连接交换机和实验室网口 2.使用ifconfig查看各PC的IP地址,并确保可以相互ping通 pc1 192.168.108.101 pc2 192.168.108.146 pc3 192.168.108.200 pc4 192.168.108.211 3.安装jdk,下载jdk-7u71-linux-i586.tar.gz,拷贝到你

ubuntu14.04安装hadoop2.7.1伪分布式和错误解决

ubuntu14.04安装hadoop2.7.1伪分布式和错误解决需要说明的是我下载的是源码,通过编译源码并安装一.需要准备的软件:1.JDK和GCC    设置JAVA_HOME:        vim ~/.bashrc        在最后增加:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_31,保存退出.        使设置立即生效:source ~/.bashrc        测试JAVA_HOME是否设置成功,输出了上面设置的路径表示成功:    

琐碎-hadoop2.2.0伪分布式和完全分布式安装(centos6.4)

环境是centos6.4-32,hadoop2.2.0 伪分布式文档:http://pan.baidu.com/s/1kTrAcWB 完全分布式文档:http://pan.baidu.com/s/1hqIeBGw 和1.x.0.x有些不同,特别是yarn.

hadoop2.2.0伪分布式搭建

一.准备linux环境 1.更改VMware适配器设置 由于是在单机环境下进行学习的,因此选择适配器模式是host-only模式,如果想要联网,可以选择桥接模式,配置的方式差不多. 点击VMware快捷方式,右键打开文件所在位置 -> 双击vmnetcfg.exe -> VMnet1 host-only ->修改subnet ip 设置网段:192.168.85.0 子网掩码:255.255.255.0 -> apply -> ok 回到windows --> 打开网络