实验的Hadoop版本为2.5.2,硬件环境是5台虚拟机,使用的均是CentOS6.6操作系统,虚拟机IP和hostname分别为:
192.168.63.171 node1.zhch
192.168.63.172 node2.zhch
192.168.63.173 node3.zhch
192.168.63.174 node4.zhch
192.168.63.175 node5.zhch
ssh免密码、防火墙、JDK这里就不在赘述了。虚拟机的角色分配是:
node1为主namenode1、主resource manager、zookeeper、journalnode
node2为备namendoe1、zookeeper、journalnode
node3为主namenode2、备resource manager、zookeeper、journalnode、datanode
node4为备namenode2、datanode
node5为datanode
步骤和
Namenode HA的安装配置基本相同,需要先
安装zookeeper集群,主要的不同在于core-site.xml、hdfs-site.xml、yarn-site.xml配置文件,其余文件的配置和Namenode HA安装配置基本一致。
一、配置Hadoop
## 解压 [[email protected] program]$ tar -zxf hadoop-2.5.2.tar.gz ## 创建文件夹 [[email protected] program]$ mkdir hadoop-2.5.2/name [[email protected] program]$ mkdir hadoop-2.5.2/data [[email protected] program]$ mkdir hadoop-2.5.2/journal [[email protected] program]$ mkdir hadoop-2.5.2/tmp ## 配置hadoop-env.sh [[email protected] program]$ cd hadoop-2.5.2/etc/hadoop/ [[email protected] hadoop]$ vim hadoop-env.sh export JAVA_HOME=/usr/lib/java/jdk1.7.0_80 ## 配置yarn-env.sh [[email protected] hadoop]$ vim yarn-env.sh export JAVA_HOME=/usr/lib/java/jdk1.7.0_80 ## 配置slaves [[email protected] hadoop]$ vim slaves node3.zhch node4.zhch node5.zhch ## 配置mapred-site.xml [[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml [[email protected] hadoop]$ vim mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node2.zhch:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node2.zhch:19888</value> </property> </configuration> ## 配置core-site.xml [[email protected] hadoop]$ vim core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/yyl/program/hadoop-2.5.2/tmp</value> </property> <property> <name>hadoop.proxyuser.hduser.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hduser.groups</name> <value>*</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>node1.zhch:2181,node2.zhch:2181,node3.zhch:2181</value> </property> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>1000</value> </property> </configuration> ## 配置hdfs-site.xml [[email protected] hadoop]$ vim hdfs-site.xml <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/yyl/program/hadoop-2.5.2/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/yyl/program/hadoop-2.5.2/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster,yourcluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>node1.zhch:9000</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>node2.zhch:9000</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> <value>node1.zhch:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> <value>node2.zhch:53310</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>node1.zhch:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>node2.zhch:50070</value> </property> <property> <name>dfs.ha.namenodes.yourcluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.yourcluster.nn1</name> <value>node3.zhch:9000</value> </property> <property> <name>dfs.namenode.rpc-address.yourcluster.nn2</name> <value>node4.zhch:9000</value> </property> <property> <name>dfs.namenode.servicerpc-address.yourcluster.nn1</name> <value>node3.zhch:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.yourcluster.nn2</name> <value>node4.zhch:53310</value> </property> <property> <name>dfs.namenode.http-address.yourcluster.nn1</name> <value>node3.zhch:50070</value> </property> <property> <name>dfs.namenode.http-address.yourcluster.nn2</name> <value>node4.zhch:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node1.zhch:8485;node2.zhch:8485;node3.zhch:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.client.failover.proxy.provider.yourcluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/yyl/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/yyl/program/hadoop-2.5.2/journal</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.mycluster</name> <value>true</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.yourcluster</name> <value>true</value> </property> <property> <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> <value>60000</value> </property> <property> <name>ipc.client.connect.timeout</name> <value>60000</value> </property> <property> <name>dfs.image.transfer.bandwidthPerSec</name> <value>4194304</value> </property> </configuration> ## 配置yarn-site.xml [[email protected] hadoop]$ vim yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>node1.zhch:2181,node2.zhch:2181,node3.zhch:2181</value> </property> <property> <name>yarn.resourcemanager.zk.state-store.address</name> <value>node1.zhch:2181,node2.zhch:2181,node3.zhch:2181</value> </property> <property> <name>yarn.resourcemanager.address.rm1</name> <value>node1.zhch:23140</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>node3.zhch:23140</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>node1.zhch:23130</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>node3.zhch:23130</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>node1.zhch:23141</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>node3.zhch:23141</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>node1.zhch:23125</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>node3.zhch:23125</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>node1.zhch:23188</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node3.zhch:23188</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm1</name> <value>node1.zhch:23189</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm2</name> <value>node3.zhch:23189</value> </property> </configuration> ## 分发到各个节点 [[email protected] hadoop]$ cd /home/yyl/program/ [[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/ [[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/ [[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/ [[email protected] program]$ scp -rp hadoop-2.5.2 [email protected]:/home/yyl/program/ ## 修改主namenode2(node3.zhch)和备namenode2(node4.zhch)的 hdfs-site.xml 配置文件中 dfs.namenode.shared.edits.dir 的值为 qjournal://node1.zhch:8485;node2.zhch:8485;node3.zhch:8485/yourcluster ,其余属性值不变。 ## 修改备resource manager(node3.zhch)的 yarn-site.xml 配置文件中 yarn.resourcemanager.ha.id 的值为 rm2 ,其余属性值不变。 ## 在各个节点上设置hadoop环境变量 [[email protected] ~]$ vim .bash_profile export HADOOP_PREFIX=/home/yyl/program/hadoop-2.5.2 export HADOOP_COMMON_HOME=$HADOOP_PREFIX export HADOOP_HDFS_HOME=$HADOOP_PREFIX export HADOOP_MAPRED_HOME=$HADOOP_PREFIX export HADOOP_YARN_HOME=$HADOOP_PREFIX export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
二、格式化与启动
## 启动Zookeeper集群 ## 在主namenode1(node1.zhch)、主namenode2(node3.zhch)上执行命令: $HADOOP_HOME/bin/hdfs zkfc -formatZK [[email protected] ~]$ hdfs zkfc -formatZK [[email protected] ~]$ hdfs zkfc -formatZK [[email protected] ~]$ zkCli.sh [zk: localhost:2181(CONNECTED) 0] ls / [hadoop-ha, zookeeper] [zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha [mycluster, yourcluster] ## 在node1.zhch node2.zhch node3.zhch上启动journalnode: [[email protected] ~]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-journalnode-node1.zhch.out [[email protected] ~]$ jps 1985 QuorumPeerMain 2222 Jps 2176 JournalNode [[email protected] ~]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-journalnode-node2.zhch.out [[email protected] ~]$ jps 1783 Jps 1737 JournalNode 1638 QuorumPeerMain [[email protected] ~]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-journalnode-node3.zhch.out [[email protected] ~]$ jps 1658 JournalNode 1495 QuorumPeerMain 1704 Jps ## 在主namenode1(node1.zhch)上格式化namenode [[email protected] ~]$ hdfs namenode -format -clusterId c1 ## 在主namenode1(node1.zhch)上启动namenode进程 [[email protected] ~]$ hadoop-daemon.sh start namenode starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node1.zhch.out [[email protected] ~]$ jps 2286 NameNode 1985 QuorumPeerMain 2369 Jps 2176 JournalNode ## 在备namenode1(node2.zhch)上同步元数据 [[email protected] ~]$ hdfs namenode -bootstrapStandby ## 在备namenode1(node2.zhch)上启动namenode进程 [[email protected] ~]$ hadoop-daemon.sh start namenode starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node2.zhch.out [[email protected] ~]$ jps 1923 Jps 1737 JournalNode 1638 QuorumPeerMain 1840 NameNode ## 在主namenode2(node3.zhch)上格式化namenode [[email protected] ~]$ hdfs namenode -format -clusterId c1 ## 在主namenode2(node3.zhch)上启动namenode进程 [[email protected] ~]$ hadoop-daemon.sh start namenode starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node3.zhch.out [[email protected] ~]$ jps 1658 JournalNode 1495 QuorumPeerMain 1767 NameNode 1850 Jps ## 在备namenode2(node4.zhch)上同步元数据 [[email protected] ~]$ hdfs namenode -bootstrapStandby ## 在备namenode2(node4.zhch)上启动namenode进程 [[email protected] ~]$ hadoop-daemon.sh start namenode starting namenode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-namenode-node4.zhch.out [[email protected] ~]$ jps 1602 Jps 1519 NameNode ## 启动DataNode [[email protected] ~]$ hadoop-daemons.sh start datanode node4.zhch: starting datanode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-datanode-node4.zhch.out node5.zhch: starting datanode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-datanode-node5.zhch.out node3.zhch: starting datanode, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-datanode-node3.zhch.out ## 启动Yarn [[email protected] ~]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-resourcemanager-node1.zhch.out node3.zhch: starting nodemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-nodemanager-node3.zhch.out node4.zhch: starting nodemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-nodemanager-node4.zhch.out node5.zhch: starting nodemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-nodemanager-node5.zhch.out ## 在所有的namenode上启动ZooKeeperFailoverController [[email protected] ~]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node1.zhch.out [[email protected] ~]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node2.zhch.out [[email protected] ~]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node3.zhch.out [[email protected] ~]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /home/yyl/program/hadoop-2.5.2/logs/hadoop-yyl-zkfc-node4.zhch.out ## 在备resource manager(node3.zhch)上启动resource manager [[email protected] ~]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/yyl/program/hadoop-2.5.2/logs/yarn-yyl-resourcemanager-node3.zhch.out ## 查看resource manager状态 [[email protected] ~]$ yarn rmadmin -getServiceState rm1 active [[email protected] ~]$ yarn rmadmin -getServiceState rm2 standby
三、验证
开两个终端,都连接到主resource manager,在终端A中运行jps命令查看resource manager进程ID,在终端B中运行MapReduce程序;然后再到终端A中kill掉resource manager进程;最后观察在主resource manager进程挂掉后,MapReduce任务是否还能正常执行完毕。
时间: 2024-10-10 21:16:56