hadoop 2.x HA(QJM)安装部署规划

一、主机服务规划:


db01                                             db02                                        db03                                      db04                                             db05

namenode                                namenode

journalnode                             journalnode                        journalnode

datanode                                  datanode                              datanode                              datanode                                   datanode

zookeeper                                zookeeper                           zookeeper

ZKFC                                           ZKFC

二、环境配置

1、创建hadoop用户用于安装软件


groupadd hadoop

useradd -g hadoop hadoop

echo "dbking588" | passwd --stdin hadoop

配置环境变量:

export HADOOP_HOME=/opt/cdh-5.3.6/hadoop-2.5.0

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH:$HOME/bin

2、配置ssh免密码登录


--配置方法:

$ ssh-keygen -t rsa

$ ssh-copy-id db07.chavin.king

(ssh-copy-id方式只能用于rsa加密秘钥配置,测试对于dsa加密配置无效)


--验证:

[[email protected] ~]$ ssh db02 date

Wed Apr 19 09:57:34 CST 2017

3、设置hadoop用户sudo权限


chmod u+w /etc/sudoers

echo "hadoop ALL=(root)NOPASSWD:ALL" >> /etc/sudoers

chmod u-w /etc/sudoers

4、关闭防火墙并且禁用selinux


sed -i ‘/SELINUX=enforcing/d‘ /etc/selinux/config

sed -i ‘/SELINUX=disabled/d‘ /etc/selinux/config

echo "SELINUX=disabled" >> /etc/selinux/config


sed -e ‘s/SELINUX=enforcing/SELINUX=disabled/d‘ /etc/selinux/config


service iptables stop

chkconfig iptables off

5、设置文件打开数量及最大进程数


cp /etc/security/limits.conf /etc/security/limits.conf.bak

echo "* soft nproc 32000" >>/etc/security/limits.conf

echo "* hard nproc 32000" >>/etc/security/limits.conf

echo "* soft nofile 65535" >>/etc/security/limits.conf

echo "* hard nofile 65535" >>/etc/security/limits.conf

6、配置集群时间同步服务


cp /etc/ntp.conf /etc/ntp.conf.bak

cp /etc/sysconfig/ntpd /etc/sysconfig/ntpd.bak

echo "restrict 192.168.100.0 mask 255.255.255.0 nomodify notrap" >> /etc/ntp.conf

echo "SYNC_HWCLOCK=yes" >> /etc/sysconfig/ntpd

service ntpd restart


0-59/10 * * * * /opt/scripts/sync_time.sh

# cat /opt/scripts/sync_time.sh

/sbin/service ntpd stop

/usr/sbin/ntpdate db01.chavin.king

/sbin/service ntpd start

7、安装java


[[email protected] ~]# vim /etc/profile

在末尾添加环境变量:

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH

检查java是否安装成功:

[[email protected] ~]# java -version

8、安装hadoop软件


# cd /opt/software

# tar -zxvf hadoop-2.5.0.tar.gz -C /opt/cdh-5.3.6/

# chown -R hadoop:hadoop /opt/cdh-5.3.6/hadoop-2.5.0

三、编辑hadoop配置文件

Hadoop HA需要配置的文件主要包括以下两类加粗内容,其他部分安装hadoop完全分布式部署方法搭建就可以了:

HDFS配置文件:

etc/hadoop/hadoop-env.sh

etc/hadoop/core-site.xml

etc/hadoop/hdfs-site.xml

etc/haoop/slaves

YARN配置文件:

etc/hadoop/yarn-env.sh

etc/hadoop/yarn-site.xml

etc/haoop/slaves

MapReduce配置文件:

etc/hadoop/mapred-env.sh

etc/hadoop/mapred-site.xml

HA相关配置文件内容如下:


[[email protected] hadoop]$ cat core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://ns1</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop-2.5.0/data/tmp</value>

</property>

<property>

<name>fs.trash.interval</name>

<value>7000</value>

</property>

</configuration>


[[email protected] hadoop]$ cat hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

        <name>dfs.nameservices</name>

        <value>ns1</value>

    </property>

<property>

<name>dfs.ha.namenodes.ns1</name>

<value>nn1,nn2</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ns1.nn1</name>

<value>db01:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ns1.nn2</name>

<value>db02:8020</value>

</property>

<property>

<name>dfs.namenode.http-address.ns1.nn1</name>

<value>db01:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.ns1.nn2</name>

<value>db02:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://db01:8485;db02:8485;db03:8485/ns1</value>

</property>

    <property>

        <name>dfs.journalnode.edits.dir</name>

        <value>/usr/local/hadoop-2.5.0/data/dfs/jn</value>

    </property>

<property>

<name>dfs.client.failover.proxy.provider.ns1</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

</configuration>


[[email protected] hadoop-2.5.0]$ cat etc/hadoop/yarn-site.xml

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.resourcemanager.hostname</name>

<value>db02</value>

</property>

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

<property>

<name>yarn.log-aggregation.retain-seconds</name>

<value>600000</value>

</property>

</configuration>


[[email protected] hadoop-2.5.0]$ cat etc/hadoop/mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>db01:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>db01:19888</value>

</property>

</configuration>


[[email protected] hadoop-2.5.0]$ cat etc/hadoop/slaves

db01

db02

db03

db04

db05


在以下文件中修改Java环境变量:

etc/hadoop/hadoop-env.sh

etc/hadoop/yarn-env.sh

etc/hadoop/mapred-env.sh


创建数据目录:

/opt/cdh-5.3.6/hadoop-2.5.0/data/tmp

/opt/cdh-5.3.6/hadoop-2.5.0/data/dfs/jn


同步文件到其他节点:

$ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0

$ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0

$ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0

$ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0

四、第一次启动集群

1、启动journalnode服务

[db01]$ sbin/hadoop-daemon.sh start journalnode

[db02]$ sbin/hadoop-daemon.sh start journalnode

[db03]$ sbin/hadoop-daemon.sh start journalnode

2、格式化hdfs文件系统

[db01]$ bin/hdfs namenode -format

3、在nn1上启动namenode

[db01]$ sbin/hadoop-daemon.sh start namenode

4、在nn2节点上同步nn1节点元数据(也可以直接cp元数据)

[db02]$ bin/hdfs namenode -bootstrapStandby

5、启动nn2上的namenode服务

[db02]$ sbin/hadoop-daemon.sh start namenode

6、启动所有的datanode服务

[db01]$ sbin/hadoop-daemon.sh start datanode

[db02]$ sbin/hadoop-daemon.sh start datanode

[db03]$ sbin/hadoop-daemon.sh start datanode

[db04]$ sbin/hadoop-daemon.sh start datanode

[db05]$ sbin/hadoop-daemon.sh start datanode

7、将nn1切换成active状态

[db01]$ bin/hdfs haadmin -transitionToActive nn1

[db01]$ bin/hdfs haadmin -getServiceState nn1

[db01]$ bin/hdfs haadmin -getServiceState nn2

至此、HDFS集群启动成功。

8、对HDFS文件系统进行基本测试

文件的创建、删除、上传、读取等等

五、手工方式验证namenode active和standby节点切换

[db01]$ bin/hdfs haadmin -transitionToStandby nn1

[db01]$ bin/hdfs haadmin -transitionToActive nn2

[db01]$ bin/hdfs haadmin -getServiceState nn1

standby

[db01]$ bin/hdfs haadmin -getServiceState nn2

active

进行HDFS基本功能测试。

六、使用zookeeper实现HDFS自动故障转移

1、根据服务规划安装zookeeper集群


安装zkserver集群:

$ tar -zxvf zookeeper-3.4.5.tar.gz -C /usr/local/

$ chown -R hadoop:hadoop zookeeper-3.4.5/

$ cp zoo_sample.cfg zoo.cfg

$vi zoo.cfg    --在文件中添加下面内容

dataDir=/usr/local/zookeeper-3.4.5/data

server.1=db01:2888:3888

server.2=db02:2888:3888

server.3=db03:2888:3888

配置myid文件:

$cd /usr/local/zookeeper-3.4.5/data/

$vi myid

输入上述对应server编号1

同步安装文件到其他两个节点上:

# scp -r zookeeper-3.4.5/ db02:/usr/local/

# scp -r zookeeper-3.4.5/ db03:/usr/local/

修改各个服务器的myid文件。


分别启动zk集群服务器:

db01$ bin/zkServer.sh start

db02$ bin/zkServer.sh start

db03$ bin/zkServer.sh start

2、修改core-site.xml和hdfs-site.xml配置文件:


core-site.xml file, 添加以下内容:

<property>

<name>ha.zookeeper.quorum</name>

<value>db01:2181,db02:2181,db03:2181</value>

</property>


hdfs-site.xml file, 添加以下内容:

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

修改后的core-site.cml和hdfs-site.xml文件主要内容如下:


core-site.xml:

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://ns1</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop-2.5.0/data/tmp</value>

</property>

<property>

<name>fs.trash.interval</name>

<value>7000</value>

</property>

        <property>

                <name>ha.zookeeper.quorum</name>

                <value>db01:2181,db02:2181,db03:2181</value>

        </property>

</configuration>


Hdfs-site.xml:

<configuration>

<property>

        <name>dfs.nameservices</name>

        <value>ns1</value>

    </property>

<property>

<name>dfs.ha.namenodes.ns1</name>

<value>nn1,nn2</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ns1.nn1</name>

<value>db01:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ns1.nn2</name>

<value>db02:8020</value>

</property>

<property>

<name>dfs.namenode.http-address.ns1.nn1</name>

<value>db01:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.ns1.nn2</name>

<value>db02:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://db01:8485;db02:8485;db03:8485/ns1</value>

</property>

    <property>

        <name>dfs.journalnode.edits.dir</name>

        <value>/usr/local/hadoop-2.5.0/data/dfs/jn</value>

    </property>

<property>

<name>dfs.client.failover.proxy.provider.ns1</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

</configuration>

配置文件修改完成后,关闭hdfs集群,并且同步文件到其他节点:


[db01]$ sbin/stop-dfs.sh

[db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/

[db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/

[db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/

[db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/

3、hadoop初始化zookeeper

[db01]$ bin/hdfs zkfc -formatZK

可以在zkCli客户端下看到hadoop-ha的文件:

[zk: localhost:2181(CONNECTED) 3] ls /

[hadoop-ha, zookeeper]

4、启动hdfs集群

[db01]$ sbin/start-dfs.sh

七、测试自动故障转移功能

[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn1

standby

[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn2

Active

[[email protected] hadoop-2.5.0]$ kill -9 25121

[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn1

active

[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn2

17/03/12 14:24:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/03/12 14:24:51 INFO ipc.Client: Retrying connect to server: db02/192.168.100.232:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep

(maxRetries=1, sleepTime=1000 MILLISECONDS)

Operation failed: Call From db01/192.168.100.231 to db02:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: 

http://wiki.apache.org/hadoop/ConnectionRefused

自动转移功能配置成功,基于QJM方式搭建的hadoop HA方式大功告成。

时间: 2024-10-07 14:42:40

hadoop 2.x HA(QJM)安装部署规划的相关文章

Hadoop分布式HA的安装部署

[toc] Hadoop分布式HA的安装部署 前言 单机版的Hadoop环境只有一个namenode,一般namenode出现问题,整个系统也就无法使用,所以高可用主要指的是namenode的高可用,即存在两个namenode节点,一个为active状态,一个为standby状态.如下图: 说明如下:HDFS的HA,指的是在一个集群中存在两个NameNode,分别运行在独立的物理节点上.在任何时间点,只有一个NameNodes是处于Active状态,另一种是在Standby状态. Active

mesos 集群安装部署规划、准备(1)

一:简介 Mesos诞生于UC Berkeley的一个研究项目,现已成为Apache Incubator中的项目.Mesos计算框架一个集群管理器,提供了有效的.跨分布式应用或框架的资源隔离和共享,可以运行Hadoop.MPI.Hypertable.Spark.使用ZooKeeper实现容错复制,使用Linux Containers来隔离任务,支持多种资源计划分配. 1: 总体架构 Apache Mesos由四个组件组成,分别是Mesos-master,mesos-slave,framework

hadoop(1)_HDFS介绍及安装部署

一.hadoop简介 1.hadoop的初衷是为了解决Nutch的海量数据爬取和存储的需要,HDFS来源于google的GFS,MapReduce来源于Google的MapReduce,HBase来源于Google的BigTable.hadoop后被引入Apache基金会. 2.hadoop两大核心设计是HDFS和MapReduce,HDFS是分布式存储系统,提供高可靠性.高扩展性.高吞吐率的数据存储服务;MapReduce是分布式计算框架,具有易于编程.高容错性和高扩展性等优点. 3.hado

Tez 整合Hadoop CDH 5.3.0安装部署

子曰:君子食无求饱,居无求安,敏于事而慎于言,就有道而正焉,可谓好学也已. 译文:君子吃不求太饱,住不求太舒适,做事勤快,说话谨慎,向道德高的人学习,并能改正自己的缺点,这样就可以称得上好学了. 最近要把CDH的版本换成了5.3.0,hive的版本从0.12换成了0.13,升级完成后,简单测试发现版本的升级对性能的影响非常大.hive在0.13中开始支持tez做为执行引擎来提高执行速度. Tez 和 MR 的对比图: 图中可以看出原始的 MR 程序是多job 的DAG,每个job都会进行写盘和读

CDH4安装部署系列之四-安装高可用CDH4

1.1  CDH4服务器规划 编号 虚拟机IP 服务 1 10.255.0.120 Namenode1 RecourseManager zkfc 2 10.255.0.145 Namenode2 zkfc 3 10.255.0.146 Journalnode1 datanode1 NodeManager MapReduce Zookeeper 4 10.255.0.149 Journalnode2 datanode1 NodeManager MapReduce Zookeeper 5 10.25

mesos 集群安装部署chronos(5)

############################################################### chronos 安装部署 ############################################################### Chronos 是一个具备容错特性的作业调度器,可处理依赖性和基于 ISO8601 的调度.Chronos 是由 Airbnb 公司推出的用来替代 cron 的开源产品.你可以用它来对作业进行编排,支持使用 Mesos

mesos 集群安装部署zookeeper(2)

三:集群安装配置 ############################################################## 配置zookeeper集群    (172.16.7.12~13 执行) ############################################################### 1:部署环境介绍: 服务器IP地址主机名安装服务 172.16.7.12ctn-7-12.ptmind.com zookeeper   myid=1 17

mesos 集群安装部署mesos-master(3)

############################################################### Mesos 集群master配置 ############################################################### 1:部署环境介绍: 服务器IP地址主机名安装服务 172.16.7.11ctn-7-11.ptmind.com mesos-master 172.16.7.12ctn-7-12.ptmind.com mesos

mesos 集群安装部署marathon(4)

############################################################### 配置 marathon 服务 ############################################################### Marathon:marathon是一个mesos框架,能够支持运行长服务,比如web应用等.是集群的分布式Init.d,能够原样运行任何Linux二进制发布版本,如Tomcat Play等等,可以集群的多进程管理