Hadoop2 namenode HA + 联邦 + Resource Manager HA 实验

实验的Hadoop版本为2.5.2,硬件环境是5台虚拟机,使用的均是CentOS6.6操作系统,虚拟机IP和hostname分别为:

192.168.63.171    node1.zhch

192.168.63.172    node2.zhch

192.168.63.173    node3.zhch

192.168.63.174    node4.zhch

192.168.63.175    node5.zhch

ssh免密码、防火墙、JDK这里就不在赘述了。虚拟机的角色分配是:

node1为主namenode1、主resource manager、zookeeper、journalnode

node2为备namendoe1、zookeeper、journalnode

node3为主namenode2、备resource manager、zookeeper、journalnode、datanode

node4为备namenode2、datanode

node5为datanode

步骤和
Namenode HA的安装配置基本相同,需要先
安装zookeeper集群,主要的不同在于core-site.xml、hdfs-site.xml、yarn-site.xml配置文件,其余文件的配置和Namenode HA安装配置基本一致。

一、配置Hadoop

## 解压
[[email protected] program]$ tar -zxf hadoop-2.5.2.tar.gz
## 创建文件夹
[[email protected] program]$ mkdir hadoop-2.5.2/name
[[email protected] program]$ mkdir hadoop-2.5.2/data
[[email protected] program]$ mkdir hadoop-2.5.2/journal
[[email protected] program]$ mkdir hadoop-2.5.2/tmp
## 配置hadoop-env.sh
[[email protected] program]$ cd hadoop-2.5.2/etc/hadoop/
[[email protected] hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/usr/lib/java/jdk1.7.0_80
## 配置yarn-env.sh
[[email protected] hadoop]$ vim yarn-env.sh
export JAVA_HOME=/usr/lib/java/jdk1.7.0_80
## 配置slaves
[[email protected] hadoop]$ vim slaves
node3.zhch
node4.zhch
node5.zhch
## 配置mapred-site.xml
[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ vim mapred-site.xml
<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>node2.zhch:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>node2.zhch:19888</value>
</property>
</configuration>

## 配置core-site.xml
[[email protected] hadoop]$ vim core-site.xml
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property>
<property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>file:/home/yyl/program/hadoop-2.5.2/tmp</value>
</property>
<property>
  <name>hadoop.proxyuser.hduser.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hduser.groups</name>
  <value>*</value>
</property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>node1.zhch:2181,node2.zhch:2181,node3.zhch:2181</value>
</property>
<property>
  <name>ha.zookeeper.session-timeout.ms</name>
  <value>1000</value>
</property>
</configuration>

## 配置hdfs-site.xml
[[email protected] hadoop]$ vim hdfs-site.xml
<configuration>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:/home/yyl/program/hadoop-2.5.2/name</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:/home/yyl/program/hadoop-2.5.2/data</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
<property>
  <name>dfs.permissions.enabled</name>
  <value>false</value>
</property>
<property>
  <name>dfs.nameservices</name>
  <value>mycluster,yourcluster</value>
</property>
<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>node1.zhch:9000</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>node2.zhch:9000</value>
</property>
<property>
  <name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
  <value>node1.zhch:53310</value>
</property>
<property>
  <name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
  <value>node2.zhch:53310</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>node1.zhch:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>node2.zhch:50070</value>
</property>
<property>
  <name>dfs.ha.namenodes.yourcluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.yourcluster.nn1</name>
  <value>node3.zhch:9000</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.yourcluster.nn2</name>
  <value>node4.zhch:9000</value>
</property>
<property>
  <name>dfs.namenode.servicerpc-address.yourcluster.nn1</name>
  <value>node3.zhch:53310</value>
</property>
<property>
  <name>dfs.namenode.servicerpc-address.yourcluster.nn2</name>
  <value>node4.zhch:53310</value>
</property>
<property>
  <name>dfs.namenode.http-address.yourcluster.nn1</name>
  <value>node3.zhch:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.yourcluster.nn2</name>
  <value>node4.zhch:50070</value>
</property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1.zhch:8485;node2.zhch:8485;node3.zhch:8485/mycluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.yourcluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/yyl/.ssh/id_rsa</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.connect-timeout</name>
  <value>30000</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/home/yyl/program/hadoop-2.5.2/journal</value>
</property>
<property>
  <name>dfs.ha.automatic-failover.enabled.mycluster</name>
  <value>true</value>
</property>
<property>
  <name>dfs.ha.automatic-failover.enabled.yourcluster</name>
  <value>true</value>
</property>
<property>
  <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
  <value>60000</value>
</property>
<property>
  <name>ipc.client.connect.timeout</name>
  <value>60000</value>
</property>
<property>
  <name>dfs.image.transfer.bandwidthPerSec</name>
  <value>4194304</value>
</property>
</configuration>

## 配置yarn-site.xml
[[email protected] hadoop]$ vim yarn-site.xml
<configuration>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.connect.retry-interval.ms</name>
  <value>2000</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>yarn-cluster</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.id</name>
  <value>rm1</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
  <value>5000</value>
</property>
<property>
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>node1.zhch:2181,node2.zhch:2181,node3.zhch:2181</value>
</property>
<property>
  <name>yarn.resourcemanager.zk.state-store.address</name>
  <value>node1.zhch:2181,node2.zhch:2181,node3.zhch:2181</value>
</property>
<property>
  <name>yarn.resourcemanager.address.rm1</name>
  <value>node1.zhch:23140</value>
</property>
<property>
  <name>yarn.resourcemanager.address.rm2</name>
  <value>node3.zhch:23140</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address.rm1</name>
  <value>node1.zhch:23130</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address.rm2</name>
  <value>node3.zhch:23130</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address.rm1</name>
  <value>node1.zhch:23141</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address.rm2</name>
  <value>node3.zhch:23141</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
  <value>node1.zhch:23125</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
  <value>node3.zhch:23125</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>node1.zhch:23188</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>node3.zhch:23188</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.https.address.rm1</name>
  <value>node1.zhch:23189</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.https.address.rm2</name>
  <value>node3.zhch:23189</value>
</property>
</configuration>

## 分发到各个节点
[[email protected] hadoop]$ cd /home/yyl/program/
[[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/
[[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/
[[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/
[[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/
## 修改主namenode2(node3.zhch)和备namenode2(node4.zhch)的 hdfs-site.xml 配置文件中 dfs.namenode.shared.edits.dir 的值为 qjournal://node1.zhch:8485;node2.zhch:8485;node3.zhch:8485/yourcluster ,其余属性值不变。
## 修改备resource manager(node3.zhch)的 yarn-site.xml 配置文件中 yarn.resourcemanager.ha.id 的值为 rm2 ,其余属性值不变。

## 在各个节点上设置hadoop环境变量
[[email protected] ~]$ vim .bash_profile
export HADOOP_PREFIX=/home/yyl/program/hadoop-2.5.2
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

二、格式化与启动

## 启动Zookeeper集群
## 在主namenode1(node1.zhch)、主namenode2(node3.zhch)上执行命令: $HADOOP_HOME/bin/hdfs zkfc -formatZK
[[email protected] ~]$ hdfs zkfc -formatZK
[[email protected] ~]$ hdfs zkfc -formatZK
[[email protected] ~]$ zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha
[mycluster, yourcluster]
## 在node1.zhch node2.zhch node3.zhch上启动journalnode:
[[email protected] ~]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-journalnode-node1.zhch.out
[[email protected] ~]$ jps
1985 QuorumPeerMain
2222 Jps
2176 JournalNode
[[email protected] ~]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-journalnode-node2.zhch.out
[[email protected] ~]$ jps
1783 Jps
1737 JournalNode
1638 QuorumPeerMain
[[email protected] ~]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-journalnode-node3.zhch.out
[[email protected] ~]$ jps
1658 JournalNode
1495 QuorumPeerMain
1704 Jps

## 在主namenode1(node1.zhch)上格式化namenode
[[email protected] ~]$ hdfs namenode -format -clusterId c1
## 在主namenode1(node1.zhch)上启动namenode进程
[[email protected] ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node1.zhch.out
[[email protected] ~]$ jps
2286 NameNode
1985 QuorumPeerMain
2369 Jps
2176 JournalNode
## 在备namenode1(node2.zhch)上同步元数据
[[email protected] ~]$ hdfs namenode -bootstrapStandby
## 在备namenode1(node2.zhch)上启动namenode进程
[[email protected] ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node2.zhch.out
[[email protected] ~]$ jps
1923 Jps
1737 JournalNode
1638 QuorumPeerMain
1840 NameNode

## 在主namenode2(node3.zhch)上格式化namenode
[[email protected] ~]$ hdfs namenode -format -clusterId c1
## 在主namenode2(node3.zhch)上启动namenode进程
[[email protected] ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node3.zhch.out
[[email protected] ~]$ jps
1658 JournalNode
1495 QuorumPeerMain
1767 NameNode
1850 Jps
## 在备namenode2(node4.zhch)上同步元数据
[[email protected] ~]$ hdfs namenode -bootstrapStandby
## 在备namenode2(node4.zhch)上启动namenode进程
[[email protected] ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node4.zhch.out
[[email protected] ~]$ jps
1602 Jps
1519 NameNode

## 启动DataNode
[[email protected] ~]$ hadoop-daemons.sh start datanode
node4.zhch: starting datanode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-datanode-node4.zhch.out
node5.zhch: starting datanode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-datanode-node5.zhch.out
node3.zhch: starting datanode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-datanode-node3.zhch.out
## 启动Yarn
[[email protected] ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-resourcemanager-node1.zhch.out
node3.zhch: starting nodemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-nodemanager-node3.zhch.out
node4.zhch: starting nodemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-nodemanager-node4.zhch.out
node5.zhch: starting nodemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-nodemanager-node5.zhch.out
## 在所有的namenode上启动ZooKeeperFailoverController
[[email protected] ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node1.zhch.out
[[email protected] ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node2.zhch.out
[[email protected] ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node3.zhch.out
[[email protected] ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node4.zhch.out
## 在备resource manager(node3.zhch)上启动resource manager
[[email protected] ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-resourcemanager-node3.zhch.out
## 查看resource manager状态
[[email protected] ~]$ yarn rmadmin -getServiceState rm1
active
[[email protected] ~]$ yarn rmadmin -getServiceState rm2
standby

三、验证

开两个终端,都连接到主resource manager,在终端A中运行jps命令查看resource manager进程ID,在终端B中运行MapReduce程序;然后再到终端A中kill掉resource manager进程;最后观察在主resource manager进程挂掉后,MapReduce任务是否还能正常执行完毕。

时间: 2024-10-10 21:16:56

Hadoop2 namenode HA + 联邦 + Resource Manager HA 实验的相关文章

hadoop官方文档学习笔记(1)——resource manager HA

resource manager HA是hadoop自从2.4之后推出的功能,以Active/Standby的方式提供冗余,目的是为了消除单点失败的风险. 1.总体架构: 2.故障切换:有自动和手动两种形式. 手动:如果以手动形式切换,使用yarn haadmin命令首先将Active节点转为standby,再将standby节点转为active. 自动:RM有基于zookeeper的节点选举机制决定哪一个是活动节点.不需要像HDFS一样部署一个zkfc守护进程,因为RM内嵌了这样的功能. 做了

Hadoop 管理工具HUE配置-Yarn Resource Manager HA配置

安装HUE之后,需要配置很多东西才能将这个系统的功能发挥出来,因为Yarn是配置的HA模式,所以在配置HUE的时候,会有些不用,下面一段文字是官网拿来的 # Configuration for YARN (MR2) # ------------------------------------------------------------------------ [[yarn_clusters]] [[[default]]] # Whether to submit jobs to this cl

yarn 与 resource manager ha

YARN最初的思想是把hadoop1中的job tracker的功能拆分出来,把它的资源管理与任务调度功能分成两个单独的进程.yarn体系结构中有两个进程,resource manager和nodemanger.前者主要负责资源分配,后者nodemanager在每一个机器中都有一个进程,负责container的创建,监控分配的资源(CPU,内存和磁盘与网络资源),同时通过心跳汇报这些情况给RM.applicationmaster是框架特定的作业进程,主要负责与RM申请资源与监控任务执行的情况.运

Hadoop2 NameNode HA配置

Hadoop2 NameNode HA配置 Hadoop2 官方提供了两种NameNode HA的实现方式,分别基于QJM和NFS,这里以基于QJM的HDFS HA为例. 实验环境 系统版本:CentOS release 6.4 (Final) Hadoop版本:Apache Hadoop2.5.1 Hive版本:Hive 0.13.1 IP列表 IP Hostname NameNode DataNode RM NodeManager JournalNode 192.168.20.54 had1

【甘道夫】NN HA 对于 Client 透明的实验

之前转载过一篇[伊利丹]写的NN HA实验记录,该博客描述了主备NN透明切换的过程,也就是说,当主NN挂掉后,自动将备NN切换为主NN了,Hadoop集群正常运行. 今天我继续做了一个实验,目的是实现NN的切换不会对Client端程序造成影响,即NN切换对Client透明. 首先,很重要的一点: 要保证core-site.xml中的 <property> <name>fs.defaultFS</name> <value>hdfs://hadoop-clust

Hadoop2.2.0集群的HA高可靠的最简单配置

HA集群需要使用nameservice ID区分一个HDFS集群.另外,HA中还要使用一个词,叫做NameNode ID.同一个集群中的不同NameNode,使用不同的NameNode ID区分.为了支持所有NameNode使用相同的配置文件,因此在配置参数中,需要把"nameservice ID"作为NameNode ID的前缀. HA配置内容是在文件hdfs-site.xml中的.下面介绍关键配置项. dfs.nameservices 命名空间的逻辑名称.如果使用HDFS Fede

基于Docker的Zookeeper+Hadoop(HA)+hbase(HA)搭建

公司要将监控数据存入opentsdb,而opentsdb使用了hbase作为存储.所以想搭建一套高可用的分布式存储来供opentsdb使用. 因为机器有限,所以测试过程中将三台集群的环境安装在docker上. 一:宿主机版本和docker版本 宿主机:Centos7.2  3.10.0-862.14.4.el7.x86_64 docker:Docker version 1.13.1, build 94f4240/1.13.1 二:镜像版本 docker.io/centos 三:创建docker镜

resource manager因为CapacityScheduler的NPE异常退出,引起failover切换

一.问题描述 yarn2.0发生resource manager down(master2)掉,并引起resource manager的failover切换 二.问题分析 1)看master2上resource manager的日志 2016-06-26 12:35:41,504 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=warehouse        OPERATION=AM Released

Azure Resource Manager 简介

Azure Resource Manager 简介? 注意 您当前查看的页面是未经授权的转载!查看最新版本请前往:http://www.cnblogs.com/qin-nz/p/azrue-resource-manager-introduction.html. 提示 本文更新时间:2016年01月01日. Azure 资源管理器 ( Azure Resource Manager ) 是微软新提供的一种管理Azure资源的一种模式. 这种管理的思想不仅出现在命令行和PowerShell中,也出现在