Hadoop HA架构搭建
共七台服务器,节点角色分配如下:
192.168.133.21 (BFLN-01):namenode ?zookeeper ?journalnade?DFSZKFailoverController
192.168.133.23 (BFLN-02):namenode?resourcemanager zookeeper ?journalnade?DFSZKFailoverController
192.168.133.24 (BFLN-03):resourcemanager zookeeper ?journalnade?DFSZKFailoverController
192.168.133.25 (BFLN-04):datanode,nodemanager
192.168.133.26 (BFLN-05):datanode,nodemanager
192.168.133.27 (BFLN-06):datanode,nodemanager
192.168.133.28 (BFLN-07):datanode,nodemanager
HA优势:双namedata和resourcemanager能防止hadoop核心组件单点故障导致集群不可用情况的发生。
配置步骤:
环境配置
1、集群间需实现时间同步:
?ntpdate
2、配置7台服务器的主机名解析/etc/hosts(每台都要配置):
192.168.133.21 ?BFLN-01
192.168.133.23 ?BFLN-02
192.168.133.24 ?BFLN-03
192.168.133.25 ?BFLN-04
192.168.133.26 ?BFLN-05
192.168.133.27 ?BFLN-06
192.168.133.28 ?BFLN-07
3、配置ssh服务/etc/ssh/sshd.conf
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
不然启动hdfs服务的时候可能会异常:
Starting namenodes on [BFLN-01 BFLN-02]The authenticity of host ‘BFLN-02 (192.168.133.23)‘ can‘t be established.ECDSA key fingerprint is 79:d1:ec:82:d3:1c:50:8a:17:c2:2d:f0:87:20:53:44.Are you sure you want to continue connecting (yes/no)? The authenticity of host ‘BFLN-01 (192.168.133.21)‘ can‘t be established.ECDSA key fingerprint is 30:75:04:10:93:d2:57:d7:3d:b1:cc:31:92:30:1a:a1.Are you sure you want to continue connecting (yes/no)? yes
4、每台服务器实现ssh无密钥认证,包括本机与本机的免密钥认证:
ssh-keygren :生成一对密钥
ssh-copy-id : 把公钥发给对方服务器
5、配置安装JAVA环境并配置JAVA和hadoop环境变量:
export JAVA_HOME=/usr/java/jdk1.8.0_51/
export HADOOP_HOME=/opt/hadoop-spark/hadoop/hadoop-2.9.1
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
安装zookeeper集群:
7、解压zookeeper压缩包。
8、修改zookeeper配置文件:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=192.168.133.21:2888:3888
server.2=192.168.133.23:2888:3888
server.3=192.168.133.24:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
9、zookeeper数据路径文件下添加个代表zookeeper节点id的myid文件(本配置文件的数据路径为/data/zookeeper,节点id分别为1,2,3)
10、启动zookeeper集群
./zkServer.sh start
安装配置Hadoop-HA:
11、下载hadoop-spark压缩包,解压,尽量保持7台服务器的hadoop安装路径是一致的。
在192.168.133.21上配置:
cd $HADOOP_HOME/etc/hadoop/
vi?core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://BFLN</value> <!--#BFLN为nodename集群的代理名字,此名字要和hdfs-site.xml配置的dfs.nameservices集群名字一致-->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop-spark/hadoop/tmp</value> <!--#指定hdfs目录-->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>BFLN-01:2181,BFLN-02:2181,BFLN-03:2181</value> <!--配置zookeeper集群的地址-->
</property>
</configuration>
?
vi hdfs-site.xml
<configuration>
????<!-- #BFLN为nodename集群的代理名字,此名字要和core-site.xml配置的fs.defaultFS集群名字一致 -->
<property>
<name>dfs.nameservices</name>
<value>BFLN</value>
</property>
????
????<!-- BFLN集群下有两个namenode节点,分别为BFLN1,BFLN2 -->
<property>
<name>dfs.ha.namenodes.BFLN</name>
<value>BFLN1,BFLN2</value>
</property>
????
????<!-- 配置namenode第一节点的rpc通信端口 -->
<property>
<name>dfs.namenode.rpc-address.BFLN.BFLN1</name>
<value>BFLN-01:9000</value>
</property>
????
????<!-- 配置namenode第一节点的http通信端口 -->
<property>
<name>dfs.namenode.http-address.BFLN.BFLN1</name>
<value>BFLN-01:50070</value>
</property>
????
????<!-- 配置namenode第二节点的rpc通信端口 -->
<property>
<name>dfs.namenode.rpc-address.BFLN.BFLN2</name>
<value>BFLN-02:9000</value>
</property>
????
????<!-- 配置namenode第二节点的http通信端口 -->
<property>
<name>dfs.namenode.http-address.BFLN.BFLN2</name>
<value>BFLN-02:50070</value>
</property>
????
????<!-- 配置journalnade互连的地址及端口,官网建议journalnade节点数为奇数 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://BFLN-01:8485;BFLN-02:8485/BFLN</value>
</property>
????
????<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop-spark/hadoop/tmp/jn</value>
</property>
????
????<!-- 开启NameNode故障时自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
????
????<!--配置失败自动切换实现方式-->
<property>
<name>dfs.client.failover.proxy.provider.BFLN</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
????
????<!--配置当namenode出现脑裂时,hdfs对其处理的方式,sshfenc会自动通过ssh到old-active将其杀掉,将standby切换为active-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
?
????<!--配置HA namenode通信公钥的地址-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
?
????<!--配置启动集群代理,如果此选项没有配置,后期启动的时候hadoop会把集群名称BFLN当成主机名与之通信,导致报错-->
<property>
<name>dfs.client.failover.proxy.provider.BFLN</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
?
<!--配置副本数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
?
????<!--配置是否检查权限-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
vi yarn-site.conf
<configuration>
<!-- 开启resourcemanager HA服务,默认是false -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 开启RM重启的功能,作用:当yarn中有任务在跑时,如果rm宕机,设置成ture,rm重启时会恢复原来没有跑完的application -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
?
<!-- 配置RM集群ID -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>BFLN-yarn</value>
</property>
?
?
<!--RM集群下的两个RM节点名称 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>BFLN-yarn1,BFLN-yarn2</value>
</property>
?
?
<!-- BFLN-yarn1节点的地址 -->
<property>
<name>yarn.resourcemanager.hostname.BFLN-yarn1</name>
<value>BFLN-02</value>
</property>
?
?
<!-- BFLN-yarn2节点的地址 -->
<property>
<name>yarn.resourcemanager.hostname.BFLN-yarn2</name>
<value>BFLN-03</value>
</property>
?
?
<!-- zookeeper集群的地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>BFLN-01:2181,BFLN-02:2181,BFLN-03:2181</value>
</property>
?
?
<!-- 用于状态存储的类,默认是基于Hadoop 文件系统的实现(FileSystemStateStore) -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
?
<!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
vi slaves:
(配置datanode节点)
192.168.133.25
192.168.133.26
192.168.133.27
192.168.133.28
?
vi hadoop-env.sh:
export JAVA_HOME=/usr/java/jdk1.8.0_51/
?
12:至此,所有的配置以及配置完成,需要将这几个文件复制发送给其他服务器。
启动HDFS服务:
注意:启动顺序很重要,顺序错了会导致后期频繁报错!
1、在BFLN-01上启动journalnode,命令:./sbin/hadoop-daemon.sh start journalnode # 启动?journalnode
2、在BFLN-01上格式化namenode,命令:./bin/hdfs namenode -format ?# 格式化namemode路径
3、在BFLN-01上注册zookeeper,命令:./bin/hdfs zkfc -formatZK ? ?# 向zookeeper集群注册hdfs
4、在BFLN-01上启动namenode,命令:./sbin/start-dfs.sh ? # 启动hdfs服务,注意,此时只会启动BFLN-01上的namenode
5、在BFLN-02上同步namenode,命令:./bin/hdfs namenode -bootstrapStandby ?#?BFLN-02节点的namenode从BFLN-01上的namenode同步元数据。
6、在BFLN-02上启动namenode,命令:./sbin/hadoop-daemon.sh start namenode ? # 在BFLN-02上启动namenode节点
7、在BFLN-02上启动resourcemanager,命令:./sbin/start-yarn.sh ?#启动RM,NM服务
8、在0BFLN-02上启动resourcemanager,命令:./sbin/yarn-daemon.sh start resourcemanager ?#启动备用RM服务。
?
测试:kill一个为active的namenode/resourcemanager节点,查看另外一个standby节点是否转化成active节点:
查看namenode节点状态的命令:
./bin/hdfs ?haadmin -getServiceState?BFLN1
./bin/hdfs ?haadmin -getServiceState?BFLN2
查看resourcemanager节点状态的命令:
./bin/yarn ?rmadmin -getServiceState?BFLN-yarn1
./bin/yarn ?rmadmin -getServiceState BFLN-yarn2
如果kill active节点后standby节点无法切换成active节点,可能系统需要安装一个软件:
psmisc
?
?
原文地址:https://www.cnblogs.com/hel7512/p/12350634.html