Hadoop安装杂记(2)

一、分布式模型

1、环境准备

准备4个节点,master1为主控节点(NameNode、SecondaryNameNode、ResourceManager),master2-4作为数据节点(DataNode、NodeManager)。并做好ntp时间同步

1.1 每个节点配置JAVA环境

[[email protected] ~]# vim /etc/profile.d/java.sh

export JAVA_HOME=/usr

[[email protected] ~]# scp /etc/profile.d/java.sh [email protected]:/etc/profile.d/
[[email protected] ~]# scp /etc/profile.d/java.sh [email protected]:/etc/profile.d/
[[email protected] ~]# scp /etc/profile.d/java.sh [email protected]:/etc/profile.d/

每个节点安装java-devel
[[email protected] ~]# yum install -y java-1.7.0-openjdk-devel
[[email protected] ~]# yum install -y java-1.7.0-openjdk-devel
[[email protected] ~]# yum install -y java-1.7.0-openjdk-devel
[[email protected] ~]# yum install -y java-1.7.0-openjdk-devel

配置hadoop环境变量:
[[email protected] ~]# vim /etc/profile.d/hadoop.sh

export HADOOP_PREFIX=/bdapps/hadoop
export PATH=$PATH:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_MAPPERD_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}

[[email protected] ~]# source /etc/profile.d/hadoop.sh

scp /etc/profile.d/hadoop.sh master2:/etc/profile.d/hadoop.sh
scp /etc/profile.d/hadoop.sh master3:/etc/profile.d/hadoop.sh
scp /etc/profile.d/hadoop.sh master4:/etc/profile.d/hadoop.sh

1.2 每个节点准备host文件,实验使用别名调用

[[email protected] ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.201.106.131 master1 master1.com master
10.201.106.132 master2 master2.com
10.201.106.133 master3 master3.com
10.201.106.134 master4 master4.com

master2,3,4节点同上

1.3 创建用户组和用户

[[email protected] ~]# useradd -g hadoop hadoop

设置用户密码:
echo ‘hadoop‘ | passwd --stdin hadoop   

master2,3,4节点同上
for i in `seq 2 4`;do ssh [email protected]${i} "echo ‘hadoop‘ | passwd --stdin hadoop";done

1.4 让master1(主控节点)的hadoop用户能通过密钥登录master1,2,3,4

[[email protected] ~]# su - hadoop

生成密钥和公钥:
[[email protected] ~]$ ssh-keygen -t rsa -P ‘hadoop‘

将master1的公钥拷贝到master1,2,3,4节点:
[[email protected] ~]$ for i in `seq 1 4`;do ssh-copy-id -i .ssh/id_rsa.pub [email protected]${i};done

2、hadoop安装

2.1 创建目录并配置权限

[[email protected] ~]# mkdir -pv /bdapps /data/hadoop/hdfs/{nn,snn,dn}

[[email protected] ~]# chown -R hadoop:hadoop /data/hadoop/hdfs/

展开hadoop:
[[email protected] ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/

创建软链接:
[[email protected] ~]# cd /bdapps/
[[email protected] bdapps]# ln -sv hadoop-2.6.2 hadoop

创建日志目录,并授权
[[email protected] ~]# cd /bdapps/hadoop
[[email protected] hadoop]# mkdir logs
[[email protected] hadoop]# chmod g+w logs

修改hadoop安装目录权限
[[email protected] hadoop]# chown -R hadoop:hadoop ./*

2.2 主节点(master1)配置

[[email protected] ~]# cd /bdapps/hadoop/etc/hadoop/

[[email protected] hadoop]# vim core-site.xml 

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020</value>
        <final>true</final>
    </property>
</configuration>

[[email protected] hadoop]# vim yarn-site.xml 

<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>10.201.106.131:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
</configuration>

[[email protected] hadoop]# vim hdfs-site.xml 

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/hdfs/nn</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/hadoop/hdfs/dn</value>
    </property>
    <property>
        <name>fs.checkpoint.dir</name>
        <value>file:///data/hadoop/hdfs/snn</value>
    </property>
    <property>
        <name>fs.checkpoint.edits.dir</name>
        <value>file:///data/hadoop/hdfs/snn</value>
    </property>
</configuration>

[[email protected] hadoop]# vim mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

[[email protected] hadoop]# vim slaves 

master2
master3
master4

2.2 配置三个从节点

[[email protected] ~]# mkdir -pv /bdapps /data/hadoop/hdfs/dn
[[email protected] ~]# chown -R hadoop:hadoop /data/hadoop/hdfs/

[[email protected] ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/

[[email protected] bdapps]# ln -sv hadoop-2.6.2 hadoop

[[email protected] bdapps]# cd hadoop
[[email protected] bdapps]# mkdir logs
[[email protected] bdapps]# chmod g+w logs
[[email protected] bdapps]# chown -R hadoop:hadoop ./*

从master1拷贝配置文件:
[[email protected] hadoop]# su - hadoop

[[email protected] ~]$ scp /bdapps/hadoop/etc/hadoop/* master2:/bdapps/hadoop/etc/hadoop/

2.3 格式化文件系统

[[email protected] ~]$ hdfs namenode -format

2.4 启动mapreduce集群

启动集群datanode节点
[[email protected] ~]$ start-dfs.sh
Starting namenodes on [master]
The authenticity of host ‘master (10.201.106.131)‘ can‘t be established.
ECDSA key fingerprint is 5e:5d:4d:d2:3f:73:fb:5c:c4:26:c7:c4:85:10:c9:75.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added ‘master‘ (ECDSA) to the list of known hosts.
master: starting namenode, logging to /bdapps/hadoop/logs/hadoop-hadoop-namenode-master1.com.out
master2: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-master2.com.out
master4: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-master4.com.out
master3: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-master3.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)‘ can‘t be established.
ECDSA key fingerprint is 5e:5d:4d:d2:3f:73:fb:5c:c4:26:c7:c4:85:10:c9:75.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added ‘0.0.0.0‘ (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /bdapps/hadoop/logs/hadoop-hadoop-secondarynamenode-master1.com.out

查看各节点启动的进程:
[[email protected] ~]$ jps
4977 NameNode
5324 Jps
5155 SecondaryNameNode

[[email protected] hadoop]# su - hadoop
Last login: Sun Apr 22 11:52:57 CST 2018 from master1 on pts/1
[[email protected] ~]$
[[email protected] ~]$
[[email protected] ~]$ jps
9972 DataNode
10131 Jps

确认主节点能够连接到另外三个从节点
[[email protected] ~]# netstat -tanp | grep 8020
tcp        0      0 10.201.106.131:8020     0.0.0.0:*               LISTEN      4977/java
tcp        0      0 10.201.106.131:8020     10.201.106.134:51956    ESTABLISHED 4977/java
tcp        0      0 10.201.106.131:8020     10.201.106.133:36426    ESTABLISHED 4977/java
tcp        0      0 10.201.106.131:8020     10.201.106.132:37988    ESTABLISHED 4977/java 

上传文件测试:
[[email protected] ~]$ hdfs dfs -mkdir /test
[[email protected] ~]$ hdfs dfs -put /etc/fstab /test/fstab
[[email protected] ~]$ hdfs dfs -ls /test/fstab
-rw-r--r--   2 hadoop supergroup       1065 2018-04-23 03:00 /test/fstab
真实文件路径:
[[email protected] logs]$ cat /data/hadoop/hdfs/dn/current/BP-1262978243-10.201.106.131-1524421803827/current/finalized/subdir0/subdir0/blk_1073741827
[[email protected] ~]$ cat /data/hadoop/hdfs/dn/current/BP-1262978243-10.201.106.131-1524421803827/current/finalized/subdir0/subdir0/blk_1073741827

2.5 启动yarn集群

[[email protected] ~]$ start-yarn.sh
#主节点启动了ResourceManager
starting yarn daemons
starting resourcemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-resourcemanager-master1.com.out
master3: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-master3.com.out
master4: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-master4.com.out
master2: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-master2.com.out
[[email protected] ~]$ jps
5919 ResourceManager
4977 NameNode
5155 SecondaryNameNode
6190 Jps

从节点启动了NodeManager
[[email protected] logs]$ jps
10243 DataNode
10508 Jps
10405 NodeManager
[[email protected] ~]$ jps
9380 DataNode
9696 NodeManager
9796 Jps

2.6 查看WEB界面状态

浏览器访问:http://10.201.106.131:8088

浏览器访问:http://10.201.106.131:50070

3、其他操作

3.1 上传大文件观察切块

生成一个200M文件:
[[email protected] ~]$ dd if=/dev/zero of=test bs=1M count=200

原始图:

上传后(超过64M后会切块):

3.2 通过浏览器查看日志

访问:http://10.201.106.131:50070/logs/

3.2 运行任务测试

列出该测试程序的可用示例:
[[email protected] ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar 

统计文件单词数:
[[email protected] ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /test/fstab /test/functions /test/wordou

查看:

查看任务进度:

查看结果 :

[[email protected] ~]$ hdfs dfs -ls /test/wordout
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2018-04-23 06:56 /test/wordout/_SUCCESS
-rw-r--r--   2 hadoop supergroup       7855 2018-04-23 06:56 /test/wordout/part-r-00000

4、yarn集群管理命令

4.1 查看yarn的所有命令

[[email protected] ~]$ yarn

4.2 application

4.2.1 查看作业

列出活动作业:
[[email protected] ~]$ yarn application -list
18/04/23 07:23:47 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
                Application-Id      Application-Name        Application-Type      User       Queue               State         Final-State         Progress                        Tracking-URL

列出所有作业:
[[email protected] ~]$ yarn application -list -appStates=all
18/04/23 07:24:48 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):1
                Application-Id      Application-Name        Application-Type      User       Queue               State         Final-State         Progress                        Tracking-URL
application_1524424153008_0001            word count               MAPREDUCE    hadoop     default            FINISHED           SUCCEEDED             100% http://master2:19888/jobhistory/job/job_1524424153008_0001
[[email protected] ~]$ 

查看作业状态:
[[email protected] ~]$ yarn application -status application_1524424153008_0001
18/04/23 07:28:32 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Application Report :
    Application-Id : application_1524424153008_0001
    Application-Name : word count
    Application-Type : MAPREDUCE
    User : hadoop
    Queue : default
    Start-Time : 1524437422005
    Finish-Time : 1524437801216
    Progress : 100%
    State : FINISHED
    Final-State : SUCCEEDED
    Tracking-URL : http://master2:19888/jobhistory/job/job_1524424153008_0001
    RPC Port : 40927
    AM Host : master2
    Aggregate Resource Allocation : 1326835 MB-seconds, 909 vcore-seconds
    Diagnostics :
[[email protected] ~]$ 

4.3 node

列出node列表:
[[email protected] ~]$ yarn node -list
18/04/23 07:33:37 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Total Nodes:3
         Node-Id         Node-State Node-Http-Address   Number-of-Running-Containers
   master4:47410            RUNNING      master4:8042                              0
   master3:55126            RUNNING      master3:8042                              0
   master2:54307            RUNNING      master2:8042                              0
列出所有node节点,包括故障下线的:
[[email protected] ~]$ yarn node -list -all

查看指定节点状态信息:
[[email protected] ~]$ yarn node -status master2:54307
18/04/23 07:41:01 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Node Report :
    Node-Id : master2:54307
    Rack : /default-rack
    Node-State : RUNNING
    Node-Http-Address : master2:8042
    Last-Health-Update : Sun 22/Apr/18 10:06:49:900CST
    Health-Report :
    Containers : 0
    Memory-Used : 0MB
    Memory-Capacity : 8192MB
    CPU-Used : 0 vcores
    CPU-Capacity : 8 vcores
    Node-Labels : 

4.4 logs

在yarn-site.xml配置文件定义yarn.log-aggregation-enable属性的值为true即可。

需要重启集群

查看作业日志
[[email protected] ~]$ yarn logs -applicationId application_1524424153008_0001

4.5 classpath

查看java环境路径:
[[email protected] ~]$ yarn classpath
/bdapps/hadoop/etc/hadoop:/bdapps/hadoop/etc/hadoop:/bdapps/hadoop/etc/hadoop:/bdapps/hadoop/share/hadoop/common/lib/*:/bdapps/hadoop/share/hadoop/common/*:/bdapps/hadoop/share/hadoop/hdfs:/bdapps/hadoop/share/hadoop/hdfs/lib/*:/bdapps/hadoop/share/hadoop/hdfs/*:/bdapps/hadoop/share/hadoop/yarn/lib/*:/bdapps/hadoop/share/hadoop/yarn/*:/bdapps/hadoop/share/hadoop/mapreduce/lib/*:/bdapps/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/bdapps/hadoop/share/hadoop/yarn/*:/bdapps/hadoop/share/hadoop/yarn/lib/*

yarn的管理命令

4.6 rmadmin

4.6.1 命令帮助

获取命令帮助:
[[email protected] ~]$ yarn rmadmin -help

4.6.2 刷新node节点状态信息

[[email protected] ~]$ yarn rmadmin -refreshNodes
18/04/23 07:54:48 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8033

4.7 运行YARN Application流程

1、Application初始化及体检;
2、分配内存并启动AM;
3、AM注册及资源分配;
4、启动并监控容易;
5、Application进度报告;
6、Application运行完成;

5、其他

5.1 官方自动安装部署hadoop工具:Ambari

原文地址:http://blog.51cto.com/zhongle21/2106614

时间: 2024-08-13 22:45:07

Hadoop安装杂记(2)的相关文章

Hadoop安装杂记(1)

一.Hadoop基础 1.伪分布式模型(单节点) 1.1 配置centos7默认JDK1.7的环境变量 [[email protected] ~]# vim /etc/profile.d/java.sh i export JAVA_HOME=/usr [[email protected] ~]# source /etc/profile.d/java.sh 安装jdk-devl包: [[email protected] ~]# yum install java-1.7.0-openjdk-deve

hadoop安装教程

hadoop的核心 Hadoop的核心就是HDFS和MapReduce,而两者只是理论基础,不是具体可使用的高级应用,Hadoop旗下有很多经典子项目,比如HBase.Hive等,这些都是基于HDFS和MapReduce发展出来的.要想了解Hadoop,就必须知道HDFS和MapReduce是什么. HDFS HDFS(Hadoop Distributed File System,Hadoop分布式文件系统),它是一个高度容错性的系统,适合部署在廉价的机器上.HDFS能提供高吞吐量的数据访问,适

转载:Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04

原文 http://www.powerxing.com/install-hadoop/ 当开始着手实践 Hadoop 时,安装 Hadoop 往往会成为新手的一道门槛.尽管安装其实很简单,书上有写到,官方网站也有 Hadoop 安装配置教程,但由于对 Linux 环境不熟悉,书上跟官网上简略的安装步骤新手往往 Hold 不住.加上网上不少教程也甚是坑,导致新手折腾老几天愣是没装好,很是打击学习热情. 本教程适合于原生 Hadoop 2,包括 Hadoop 2.6.0, Hadoop 2.7.1

分布式Hadoop安装

本文旨在介绍通过两台物理节点,实现完全分布式hadoop程序的部署 writen by [email protected] 环境介绍: 主机名 机器IP 用途 描述 Hadoop0 192.168.80.101 Namenode secondaryNamenode jobTracker CentOS6.4 jdk-6u24-linux-i586 hadoop-1.1.2 Hadoop1 192.168.80.102 Datanode taskTracker CentOS6.4 jdk-6u24-

Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04

参见:http://www.powerxing.com/install-hadoop/ 完全拷贝至上面网址 环境 本教程使用 Ubuntu 14.04 64位 作为系统环境(Ubuntu 12.04 也行,32位.64位均可),请自行安装系统. 如果用的是 RedHat/CentOS 系统,请查看相应的CentOS安装Hadoop教程_单机伪分布式配置. 本教程基于原生 Hadoop 2,在 Hadoop 2.6.0 (stable) 版本下验证通过,可适合任何 Hadoop 2.x.y 版本,

hadoop安装和hadoop pipes编程说明

本篇文章主要是对hadoop pipes编程的一些问题的备注,对于网上常见的问题,并未完全写入. 安装 基础环境:3台基于centos7的虚拟机(1个master,2个slave:slave1.slave2).hadoop-2.6.0 1. hadoop安装主要参考的网址是:hadoop参考安装 2. linux配置ssh免密码登录,具体参考的是:centos ssh免密码登录 tips: 1. 三个虚拟机的一定要用一样的帐号,即用户名.如果不一样的话,通过ssh进行免密码登录时会出现问题.比如

Alex 的 Hadoop 菜鸟教程: 第4课 Hadoop 安装教程 - HA方式 (2台服务器)

声明 本文基于Centos 6.x + CDH 5.x 官方英文安装教程 http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-Guide/cdh5ig_cdh5_install.html 本文并不是简单翻译,而是再整理 如果没有yum源请参考http://blog.csdn.net/nsrainbow/article/details/36629339#t2 准备工作 用vm

Hadoop安装(Ubuntu Kylin 14.04)

安装环境:ubuntu kylin 14.04   haoop-1.2.1   hadoop下载地址:http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/ 1.安装JDK,这里要注意的是:要想在hadoop中使用,则需要在hadoop下输入命令:source  /etc/profile 来实现,然后用java -version 测试看看是否生效了. 2.创建hadoop用户和用户组(切换到root) 创建hadoop用户组   sudo ad

手把手教你hadoop安装

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:m="h