一、主机服务规划:
db01 db02 db03 db04 db05 namenode namenode journalnode journalnode journalnode datanode datanode datanode datanode datanode zookeeper zookeeper zookeeper ZKFC ZKFC |
二、环境配置
1、创建hadoop用户用于安装软件
groupadd hadoop useradd -g hadoop hadoop echo "dbking588" | passwd --stdin hadoop 配置环境变量: export HADOOP_HOME=/opt/cdh-5.3.6/hadoop-2.5.0 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH:$HOME/bin |
2、配置ssh免密码登录
--配置方法: $ ssh-keygen -t rsa $ ssh-copy-id db07.chavin.king (ssh-copy-id方式只能用于rsa加密秘钥配置,测试对于dsa加密配置无效) |
--验证: [[email protected] ~]$ ssh db02 date Wed Apr 19 09:57:34 CST 2017 |
3、设置hadoop用户sudo权限
chmod u+w /etc/sudoers echo "hadoop ALL=(root)NOPASSWD:ALL" >> /etc/sudoers chmod u-w /etc/sudoers |
4、关闭防火墙并且禁用selinux
sed -i ‘/SELINUX=enforcing/d‘ /etc/selinux/config sed -i ‘/SELINUX=disabled/d‘ /etc/selinux/config echo "SELINUX=disabled" >> /etc/selinux/config |
sed -e ‘s/SELINUX=enforcing/SELINUX=disabled/d‘ /etc/selinux/config |
service iptables stop chkconfig iptables off |
5、设置文件打开数量及最大进程数
cp /etc/security/limits.conf /etc/security/limits.conf.bak echo "* soft nproc 32000" >>/etc/security/limits.conf echo "* hard nproc 32000" >>/etc/security/limits.conf echo "* soft nofile 65535" >>/etc/security/limits.conf echo "* hard nofile 65535" >>/etc/security/limits.conf |
6、配置集群时间同步服务
cp /etc/ntp.conf /etc/ntp.conf.bak cp /etc/sysconfig/ntpd /etc/sysconfig/ntpd.bak echo "restrict 192.168.100.0 mask 255.255.255.0 nomodify notrap" >> /etc/ntp.conf echo "SYNC_HWCLOCK=yes" >> /etc/sysconfig/ntpd service ntpd restart |
0-59/10 * * * * /opt/scripts/sync_time.sh # cat /opt/scripts/sync_time.sh /sbin/service ntpd stop /usr/sbin/ntpdate db01.chavin.king /sbin/service ntpd start |
7、安装java
[[email protected] ~]# vim /etc/profile 在末尾添加环境变量: export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH 检查java是否安装成功: [[email protected] ~]# java -version |
8、安装hadoop软件
# cd /opt/software # tar -zxvf hadoop-2.5.0.tar.gz -C /opt/cdh-5.3.6/ # chown -R hadoop:hadoop /opt/cdh-5.3.6/hadoop-2.5.0 |
三、编辑hadoop配置文件
Hadoop HA需要配置的文件主要包括以下两类加粗内容,其他部分安装hadoop完全分布式部署方法搭建就可以了:
HDFS配置文件:
etc/hadoop/hadoop-env.sh
etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/haoop/slaves
YARN配置文件:
etc/hadoop/yarn-env.sh
etc/hadoop/yarn-site.xml
etc/haoop/slaves
MapReduce配置文件:
etc/hadoop/mapred-env.sh
etc/hadoop/mapred-site.xml
HA相关配置文件内容如下:
[[email protected] hadoop]$ cat core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-2.5.0/data/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>7000</value> </property> </configuration> |
[[email protected] hadoop]$ cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>db01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>db02:8020</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>db01:50070</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>db02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://db01:8485;db02:8485;db03:8485/ns1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop-2.5.0/data/dfs/jn</value> </property> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> </configuration> |
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/yarn-site.xml <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>db02</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>600000</value> </property> </configuration> |
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>db01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>db01:19888</value> </property> </configuration> |
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/slaves db01 db02 db03 db04 db05 |
在以下文件中修改Java环境变量: etc/hadoop/hadoop-env.sh etc/hadoop/yarn-env.sh etc/hadoop/mapred-env.sh |
创建数据目录: /opt/cdh-5.3.6/hadoop-2.5.0/data/tmp /opt/cdh-5.3.6/hadoop-2.5.0/data/dfs/jn |
同步文件到其他节点: $ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0 $ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0 $ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0 $ scp /opt/cdh-5.3.6/hadoop-2.5.0 [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0 |
四、第一次启动集群
1、启动journalnode服务
[db01]$ sbin/hadoop-daemon.sh start journalnode
[db02]$ sbin/hadoop-daemon.sh start journalnode
[db03]$ sbin/hadoop-daemon.sh start journalnode
2、格式化hdfs文件系统
[db01]$ bin/hdfs namenode -format
3、在nn1上启动namenode
[db01]$ sbin/hadoop-daemon.sh start namenode
4、在nn2节点上同步nn1节点元数据(也可以直接cp元数据)
[db02]$ bin/hdfs namenode -bootstrapStandby
5、启动nn2上的namenode服务
[db02]$ sbin/hadoop-daemon.sh start namenode
6、启动所有的datanode服务
[db01]$ sbin/hadoop-daemon.sh start datanode
[db02]$ sbin/hadoop-daemon.sh start datanode
[db03]$ sbin/hadoop-daemon.sh start datanode
[db04]$ sbin/hadoop-daemon.sh start datanode
[db05]$ sbin/hadoop-daemon.sh start datanode
7、将nn1切换成active状态
[db01]$ bin/hdfs haadmin -transitionToActive nn1
[db01]$ bin/hdfs haadmin -getServiceState nn1
[db01]$ bin/hdfs haadmin -getServiceState nn2
至此、HDFS集群启动成功。
8、对HDFS文件系统进行基本测试
文件的创建、删除、上传、读取等等
五、手工方式验证namenode active和standby节点切换
[db01]$ bin/hdfs haadmin -transitionToStandby nn1
[db01]$ bin/hdfs haadmin -transitionToActive nn2
[db01]$ bin/hdfs haadmin -getServiceState nn1
standby
[db01]$ bin/hdfs haadmin -getServiceState nn2
active
进行HDFS基本功能测试。
六、使用zookeeper实现HDFS自动故障转移
1、根据服务规划安装zookeeper集群
安装zkserver集群: $ tar -zxvf zookeeper-3.4.5.tar.gz -C /usr/local/ $ chown -R hadoop:hadoop zookeeper-3.4.5/ $ cp zoo_sample.cfg zoo.cfg $vi zoo.cfg --在文件中添加下面内容 dataDir=/usr/local/zookeeper-3.4.5/data server.1=db01:2888:3888 server.2=db02:2888:3888 server.3=db03:2888:3888 配置myid文件: $cd /usr/local/zookeeper-3.4.5/data/ $vi myid 输入上述对应server编号1 同步安装文件到其他两个节点上: # scp -r zookeeper-3.4.5/ db02:/usr/local/ # scp -r zookeeper-3.4.5/ db03:/usr/local/ 修改各个服务器的myid文件。 |
分别启动zk集群服务器: db01$ bin/zkServer.sh start db02$ bin/zkServer.sh start db03$ bin/zkServer.sh start |
2、修改core-site.xml和hdfs-site.xml配置文件:
core-site.xml file, 添加以下内容: <property> <name>ha.zookeeper.quorum</name> <value>db01:2181,db02:2181,db03:2181</value> </property> |
hdfs-site.xml file, 添加以下内容: <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> |
修改后的core-site.cml和hdfs-site.xml文件主要内容如下:
core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-2.5.0/data/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>7000</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>db01:2181,db02:2181,db03:2181</value> </property> </configuration> |
Hdfs-site.xml: <configuration> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>db01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>db02:8020</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>db01:50070</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>db02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://db01:8485;db02:8485;db03:8485/ns1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop-2.5.0/data/dfs/jn</value> </property> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration> |
配置文件修改完成后,关闭hdfs集群,并且同步文件到其他节点:
[db01]$ sbin/stop-dfs.sh [db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/ [db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/ [db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/ [db01]$ scp -r etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml [email protected]:/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop/ |
3、hadoop初始化zookeeper
[db01]$ bin/hdfs zkfc -formatZK
可以在zkCli客户端下看到hadoop-ha的文件:
[zk: localhost:2181(CONNECTED) 3] ls /
[hadoop-ha, zookeeper]
4、启动hdfs集群
[db01]$ sbin/start-dfs.sh
七、测试自动故障转移功能
[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn1
standby
[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn2
Active
[[email protected] hadoop-2.5.0]$ kill -9 25121
[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn1
active
[[email protected] hadoop-2.5.0]$ bin/hdfs haadmin -getServiceState nn2
17/03/12 14:24:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/12 14:24:51 INFO ipc.Client: Retrying connect to server: db02/192.168.100.232:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep
(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From db01/192.168.100.231 to db02:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
自动转移功能配置成功,基于QJM方式搭建的hadoop HA方式大功告成。