Hadoop zookeeper HA高可靠集群部署搭建,及错误诊断

http://archive-primary.cloudera.com/cdh5/cdh/5/

一.准备工作
1.修改Linux主机名,每台都得配置
[[email protected] ~]# vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=h201
2.修改IP /etc/sysconfig/network-scripts/ifcfg-eth0
3.修改主机名和IP的映射关系(h24,h25为主,h21,h22,h23为从)
[[email protected] ~]# vim /etc/hosts
192.168.1.21 h21
192.168.1.22 h22
192.168.1.23 h23
192.168.1.24 h24
192.168.1.25 h25
######注意######如果你们公司是租用的服务器或是使用的云主机(如华为用主机、阿里云主机等)
/etc/hosts里面要配置的是内网IP地址和主机名的映射关系

4.关闭防火墙
#查看防火墙状态
[[email protected] ~]# service iptables status
#关闭防火墙
[[email protected] ~]# service iptables stop
#查看防火墙开机启动状态
[[email protected] ~]# chkconfig iptables --list
#关闭防火墙开机启动
[[email protected] ~]# chkconfig iptables off
5台机器 创建hadoop 用户
[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop
hadoop 密码:123456
前4步用root用户操作,操作完后重启机器

5.ssh免登陆hadoop用户操作
[[email protected] ~]# su - hadoop
#生成ssh免登陆密钥
#进入到我的home目录
cd ~/.ssh
ssh-keygen -t rsa (四个回车)
执行完这个命令后,会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
将公钥拷贝到要免密登陆的目标机器上
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-keygen -t rsa

[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h21
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h22
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h23
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h24
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h25

[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h21
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h22
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h23
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h24
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h25

[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h21
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h22
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h23
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h24
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h25

[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h21
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h22
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h23
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h24
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h25

[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h21
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h22
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h23
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h24
[[email protected] ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h25

6.安装JDK,配置环境变量等root用户操作( //根据自己的路金配置)
卸载系统之前jdk版本(便于已安装的jdk生效)
[[email protected] ~]$rpm -e --nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115

[[email protected] tmp]# tar -zxvf jdk-7u25-linux-i586.tar.gz -C /usr/local
[[email protected] ~]# vim /etc/profile 或者在用户下 vim .bash_profile
export JAVA_HOME=/usr/local/jdk1.7.0_25
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile 或者 source .bash_profile
查看java版本
[[email protected] ~]# java -version
————————————————————————————————



(/etc/bashrc

export JAVA_HOME=/usr/jdk1.7.0_25

export JRE_HOME=$JAVA_HOME/jre

export PATH=$JAVA_HOME/bin:$PATH)



————————————————————————————————

二.集群规划:
主机名 IP 安装软件 运行进程
h24 192.168.1.24:jdk、hadoop
namenode resourcemanage
DFSZKFailoverController(zkfc)
h25 192.168.1.25:jdk、hadoop
namenode resourcemanage
DFSZKFailoverController(zkfc)
h21 192.168.1.21:jdk、hadoop、zookeeper
datanode nodemanage
journalnode QuorumPeerMain

h22 192.168.1.22:jdk、hadoop、zookeeper
datanode nodemanage
journalnode QuorumPeerMain
h23 192.168.1.23:jdk、hadoop、zookeeper
datanode nodemanage
journalnode QuorumPeerMain

三.安装步骤:
1.安装配置zooekeeper集群(在hadoop-server3上)
1.1解压
[[email protected] tmp]# tar zxvf zookeeper-3.4.5-cdh5.5.2.tar.gz -C /usr/local/
1.2修改配置
[[email protected] tmp]# cd /usr/localzookeeper-3.4.5/conf/
[[email protected] conf]# cp zoo_sample.cfg zoo.cfg
[[email protected] conf]# vim zoo.cfg
修改添加:
dataDir=/usr/local/zookeeper-3.4.5-cdh5.5.2/data
dataLogDir=/usr/local/zookeeper-3.4.5-cdh5.5.2/log
在最后添加:
server.1=192.168.1.23:2888:3888
server.2=192.168.1.24:2888:3888
server.3=192.168.1.25:2888:3888
保存退出
然后创建一个data文件夹
[[email protected] ~]# cd /usr/local/zookeeper-3.4.5-cdh5.5.2/
[[email protected] zookeeper-3.4.5-cdh5.5.2]# mkdir -pv data log
再创建一个空文件
touch /usr/localzookeeper-3.4.5/data/myid
最后向该文件写入ID
echo 1 > /usr/localzookeeper-3.4.5/data/myid
1.3将配置好的zookeeper拷贝到其他节点
[[email protected] ~]# scp -r /usr/localzookeeper-3.4.5/ h22:/usr/local
[[email protected] ~]# scp -r /usr/localzookeeper-3.4.5/ h23:/usr/local
注意:修改hadoop-server4、hadoop-server5对应/usr/localzookeeper-3.4.5/data/myid内容
hadoop-server4:
echo 2 > /usr/localzookeeper-3.4.5/data/myid
hadoop-server5:
echo 3 > /usr/localzookeeper-3.4.5/data/myid

2.安装配置hadoop集群(在hadoop-server1上操作)
2.1解压
[[email protected] tmp]# tar -zxvf hadoop-2.6.0-cdh5.5.2.tar.gz -C /usr/local/
mv hadoop-2.6.0-cdh5.5.2 hadoop-2.6.0
2.2配置HDFS(hadoop2.0所有的配置文件都在$HADOOP_HOME/etc/hadoop目录下)
#将hadoop添加到环境变量中
vim /etc/profile或者在用户下 vim .bash_profile
export JAVA_HOME=/usr/local/jdk1.7.0_25
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

#hadoop2.0的配置文件全部在$HADOOP_HOME/etc/hadoop下
[[email protected] local]# cd /usr/local/hadoop-2.6.0/etc/hadoop

2.2.1修改vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_25

2.2.2修改vim core-site.xml
<configuration>

fs.defaultFS
hdfs://ns1/

hadoop.tmp.dir
/usr/local/hadoop-2.6.0/tmp

ha.zookeeper.quorum
h21:2181,h22:2181,h23:2181

2.2.3修改vim hdfs-site.xml

dfs.nameservices
ns1

dfs.ha.namenodes.ns1
nn1,nn2

dfs.namenode.rpc-address.ns1.nn1
h24:9000

dfs.namenode.http-address.ns1.nn1
h24:50070

dfs.namenode.rpc-address.ns1.nn2
h25:9000

dfs.namenode.http-address.ns1.nn2
h25:50070

dfs.namenode.shared.edits.dir
qjournal://h21:8485;h22:8485;h23:8485/ns1

dfs.journalnode.edits.dir
/usr/local/hadoop-2.6.0/journaldata

dfs.ha.automatic-failover.enabled
true

dfs.client.failover.proxy.provider.ns1
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.ha.fencing.methods

sshfence
shell(/bin/true)

dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa

dfs.ha.fencing.ssh.connect-timeout
30000

2.2.4 拷贝生成 [[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml
修改vim mapred-site.xml

mapreduce.framework.name
yarn

2.2.5修改vim yarn-site.xml

yarn.resourcemanager.ha.enabled
true

yarn.resourcemanager.cluster-id
yrc

yarn.resourcemanager.ha.rm-ids
rm1,rm2

yarn.resourcemanager.hostname.rm1
h24

yarn.resourcemanager.hostname.rm2
h25

yarn.resourcemanager.zk-address
h21:2181,h22:2181,h23:2181

yarn.nodemanager.aux-services
mapreduce_shuffle

2.2.6修改vim slaves(slaves是指定子节点的位置)
h21
h22
h23
—————————————————————————————————————————————————————————————-------------------------------------------
********************************************************************************************************
2.2.7配置免密码登陆
#首先要配置hadoop-server1到hadoop-server2、hadoop-server3、hadoop-server4、hadoop-server5的免密码登陆
#在hadoop-server1上生产一对钥匙
ssh-keygen -t rsa
#将公钥拷贝到其他节点,包括自己
ssh-copy-id h201
ssh-copy-id h202
ssh-copy-id h203
ssh-copy-id h204
ssh-copy-id h205
#注意:resourcemanager到nodemanager要配置免密登录
#注意:两个namenode之间要配置ssh免密码登陆,别忘了配置hadoop-server2到hadoop-server1的免登陆
在hadoop-server2上生产一对钥匙
ssh-keygen -t rsa
ssh-copy-id -i h201
**********************************************************************************************************
____________________________________________________________________________________________________________
2.4将配置好的hadoop拷贝到其他节点
[[email protected] ~]$ scp -r /usr/local/hadoop-2.6.0/ h21:/usr/local/
[[email protected] ~]$ scp -r /usr/local/hadoop-2.6.0/ h22:/usr/local/
[[email protected] ~]$ scp -r /usr/local/hadoop-2.6.0/ h23:/usr/local/
[[email protected] ~]$ scp -r /usr/local/hadoop-2.6.0/ h25:/usr/local/
授权
[[email protected] ~]$chown hadoop.hadoop /usr/local/hadoop-2.6.0/ -R
[[email protected] ~]$chown hadoop.hadoop /usr/local/hadoop-2.6.0/ -R
[[email protected] ~]$chown hadoop.hadoop /usr/local/hadoop-2.6.0/ -R
[[email protected] ~]$chown hadoop.hadoop /usr/local/hadoop-2.6.0/ -R
[[email protected] ~]$chown hadoop.hadoop /usr/local/hadoop-2.6.0/ -R
配置环境变量
[[email protected] ~]$ su - hadoop
[[email protected] ~]$ vi .bash_profile /etc/profile
export JAVA_HOME=/usr/local/jdk1.7.0_25
export JAVA_BIN=/usr/local/jdk1.7.0_25/bin
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME JAVA_BIN PATH CLASSPATH
HADOOP_HOME=/usr/local/hadoop-2.6.0
HADOOP_SBIN=/usr/local/hadoop-2.6.0/sbin
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH
[[email protected] ~]$ su - hadoop
[[email protected] ~]$ vi .bash_profile /etc/profile
export JAVA_HOME=/usr/local/jdk1.7.0_25
export JAVA_BIN=/usr/local/jdk1.7.0_25/bin
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME JAVA_BIN PATH CLASSPATH
HADOOP_HOME=/usr/local/hadoop-2.6.0
HADOOP_SBIN=/usr/local/hadoop-2.6.0/sbin
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH
[[email protected] ~]$ source .bash_profile 或 su - root su - hadoop 或 reboot
[email protected]:/usr/local
###注意:严格按照下面的步骤
2.5启动zookeeper集群(分别在hadoop-server3、hadoop-server4、hadoop-server5上启动zk)
[[email protected] ~]$ cd /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/
[[email protected] bin]$ ./zkServer.sh start
[[email protected] bin]$ ./zkServer.sh status
[[email protected] ~]$ cd /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/
[[email protected] bin]$ ./zkServer.sh start
[[email protected] bin]$ ./zkServer.sh status
[[email protected] ~]$ cd /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/
[[email protected] bin]$ ./zkServer.sh start
[[email protected] bin]$ ./zkServer.sh status
#查看状态:一个leader,两个follower
2.6启动journalnode(分别在在hadoop-server3、hadoop-server4、hadoop-server5上执行)
[[email protected] hadoop]$ cd /usr/local/hadoop-2.6.0
[[email protected] hadoop-2.6.0]$ sbin/hadoop-daemon.sh start journalnode
[[email protected] hadoop-2.6.0]$ jps
12744 JournalNode
4133 QuorumPeerMain
12790 Jps
[[email protected] hadoop]$ cd /usr/local/hadoop-2.6.0
[[email protected] hadoop-2.6.0]$ sbin/hadoop-daemon.sh start journalnode
[[email protected] hadoop-2.6.0]$ jps
12744 JournalNode
4133 QuorumPeerMain
12790 Jps
[[email protected] hadoop]$ cd /usr/local/hadoop-2.6.0
[[email protected] hadoop-2.6.0]$ sbin/hadoop-daemon.sh start journalnode
[[email protected] hadoop-2.6.0]$ jps
12744 JournalNode
4133 QuorumPeerMain
12790 Jps
#运行jps命令检验,hadoop-server3、hadoop-server4、hadoop-server5上多了JournalNode进程
________________________________
********************************
[[email protected] hadoop-2.6.0]$ jps
12744 JournalNode
4133 QuorumPeerMain
12790 Jps
*********************************
_________________________________
2.7格式化HDFS
#在hadoop-server1上执行命令:
[[email protected] hadoop]$ cd /usr/local/hadoop-2.6.0/
[[email protected] hadoop-2.6.0]$ bin/hdfs namenode -format
#格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的 是/usr/local/hadoop-2.6.0/tmp,然
后将/usr/local/hadoop-2.6.0/tmp拷贝到hadoop-server2的/usr/local/hadoop-2.6.0/下。
[[email protected] hadoop-2.6.0]$ scp -r tmp/ h25:/usr/local/hadoop-2.6.0/
##也可以这样,在hadoop-server2上执行命令:建议hdfs namenode -bootstrapStandby
2.8格式化ZKFC(在hadoop-server1上执行即可)
[[email protected] hadoop-2.6.0]$ bin/hdfs zkfc -formatZK
2.9启动HDFS(在hadoop-server1上执行,输入密码两次)
[[email protected] hadoop-2.6.0]$ sbin/start-dfs.sh
___________________________________________________________________________________________________________________________________________________
***************************************************************************************************************************************************
[[email protected] hadoop-2.6.0]$ sbin/start-dfs.sh
18/06/21 02:01:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [h24 h25]
[email protected]‘s password: h24: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-namenode-h24.out
h25: Connection closed by 192.168.1.25
h21: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-datanode-h21.out
h22: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-datanode-h22.out
h23: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-datanode-h23.out
Starting journal nodes [h21 h22 h23]
h22: journalnode running as process 12589. Stop it first.
h23: journalnode running as process 12709. Stop it first.
h21: journalnode running as process 12744. Stop it first.
18/06/21 02:06:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting ZK Failover Controllers on NN hosts [h24 h25]
[email protected]‘s password: h24: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-h24.out
***************************************************************************************************************************************************
____________________________________________________________________________________________________________________________________________________
[[email protected] hadoop-2.6.0]$ sbin/start-dfs.sh
18/06/21 02:09:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [h24 h25]
[email protected]‘s password: h24: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-namenode-h24.out
h25: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-namenode-h25.out
h22: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-datanode-h22.out
h21: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-datanode-h21.out
h23: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-datanode-h23.out
Starting journal nodes [h21 h22 h23]
h21: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-h21.out
h23: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-h23.out
h22: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-h22.out
18/06/21 02:09:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting ZK Failover Controllers on NN hosts [h24 h25]
[email protected]‘s password: h24: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-h24.out
h25: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-h25.out
2.10启动YARN(#####注意#####:是在hadoop-server1上执行start-yarn.sh)
[[email protected] hadoop-2.6.0]$ sbin/start-yarn.sh
在hadoop-server2上启动 yarn-daemon.sh start resourcemanager
[[email protected] sbin]$ ./yarn-daemon.sh start resourcemanager
到此,hadoop-2.6.0配置完毕,可以统计浏览器访问:
http://192.168.1.24:50070
NameNode ‘h24:9000‘ (active)
http://192.168.1.25:50070
NameNode ‘h25:9000‘ (standby)
验证HDFS HA
首先向hdfs上传一个文件 h24
[[email protected] hadoop]$ hadoop fs -mkdir /profile
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
然后再kill掉active的NameNode
——————————————————————————————————————————————————————
******************************************************
[[email protected] hadoop-2.6.0]$ sbin/hadoop-daemon.sh start namenode
namenode running as process 17838. Stop it first.
通过这种方法也可以查看进程号
****************************************************
————————————————————————————————————————————————————————
[[email protected] hadoop]$ ps -ef | grep ‘NameNode‘
结果显示
hadoop 18783 16379 0 02:40 pts/1 00:00:00 grep NameNode
kill -9 16379
kill -9
通过浏览器访问:http://192.168.1.212:50070
NameNode ‘h212:9000‘ (active)
这个时候hadoop-server2上的NameNode变成了active
这个时候hadoop-server1的网页刷新后 无显示处于宕机状态
在执行命令:
hadoop fs -ls /
-rw-r--r-- 3 root supergroup 1926 2015-06-24 15:36 /profile
刚才上传的文件依然存在!!!
手动启动那个挂掉的NameNode
sbin/hadoop-daemon.sh start namenode
通过浏览器访问:http://192.168.1.24:50070
NameNode ‘h24:9000‘ (standby)
验证YARN:
运行一下hadoop提供的demo中的WordCount程序:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /profile /out
OK,大功告成!!!
测试集群工作状态的一些指令 :
bin/hdfs dfsadmin -report 查看hdfs的各节点状态信息
bin/hdfs haadmin -getServiceState nn1 获取一个namenode节点的HA状态
sbin/hadoop-daemon.sh start namenode 单独启动一个namenode进程
./hadoop-daemon.sh start zkfc 单独启动一个zkfc进程
_____________________________________________________________________________
*****************************************************************************
报错:
[[email protected] bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
解决办法
[[email protected] bin]$ vim zkServer.sh
在文件最后添加jdk环境变量
export JAVA_HOME=/usr/jdk1.7.0_25
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
[[email protected] bin]$ ./zkServer.sh stop
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/../conf/zoo.cfg
[[email protected] bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/../conf/zoo.cfg
此时其他节点也要开启
[[email protected] bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/../conf/zoo.cfg
Mode: follower
[[email protected] bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/../conf/zoo.cfg
Mode: leader
[[email protected] bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.5-cdh5.5.2/bin/../conf/zoo.cfg
Mode: follower
警告:
[[email protected] ~]$ hadoop fs -put WordCount.txt /profile/
18/06/21 02:56:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决方法:
[[email protected] hadoop]$ cd /usr/local/hadoop-2.6.0/etc/hadoop
[[email protected] hadoop]$ vim log4j.properties
在文件末行添加
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
而后配置所有jar包环境变量
wordcount 执行报错
[[email protected] ~]$ hadoop jar wc.jar WordCount /profile/WordCount.txt /outt
18/06/21 04:18:56 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/06/21 04:18:56 INFO retry.RetryInvocationHandler: Exception while invoking getNewApplication of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 25441ms.
java.net.ConnectException: Call From h24/192.168.1.24 to h25:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy17.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:217)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy18.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:206)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:214)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:187)
at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:231)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:156)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at WordCount.main(WordCount.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
at org.apache.hadoop.ipc.Client.call(Client.java:1442)
... 30 more
解决方法:检查/etc/hadoop/中配置文件监控页面的端口是否正确,重启集群即可生效。错误异常解除
**************************************************************************************************
————————————————————————————————
![](http://i2.51cto.com/images/blog/201806/24/8ce11da25cce75e64ff5e5cec3834ac5.png?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=)![](http://i2.51cto.com/images/blog/201806/24/98d5d47c26ecab2fc8d3c9e3a699ddee.png?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=)

原文地址:http://blog.51cto.com/13749369/2132250

时间: 2024-10-07 18:31:09

Hadoop zookeeper HA高可靠集群部署搭建,及错误诊断的相关文章

conga下HA高可用集群的搭建(redhat6.5)

实验环境 redhat6.5虚拟机三台,各1G内存,时间同步 server1  172.25.44.1 server2  172.25.44.2 server3  172.25.44.3 火墙和selinux均处于关闭状态 编辑/etc/hosts文件,加入解析 不支持在集群节点中使用 NetworkManager.如果已经在集群节点中安装了 NetworkManager,应该删除或者禁用该程序. server1  server2 ricci节点 (1).yum源配置 vim /etc/yum.

企业部分之-HA高可用集群的搭建

高可用念集群的概: 高可用集群就是当某一个节点或服务器发生故障时,另一个节点能够自动且立即向外提供服务,即将有故障节点上的资源转移到另一个节点上去,这样另一个节点有了资源既可以向外提供服务.高可用集群是用于单个节点发生故障时,能够自动将资源.服务进行切换,这样可以保证服务一直在线.在这个过程中,对于客户端来说是透明的. 实验环境:需要三台纯净的虚拟机   iptables关闭  selinux为disabled server1:172.25.45.1   #ricci节点主机 server2:1

Hadoop加zookeeper构建高可靠集群

事前准备 1.更改Linux主机名,每个人都有配置 vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=hadoop-server1 2.改动IP /etc/sysconfig/network-scripts/ifcfg-eth0 3.改动主机名和IP的映射关系 vim /etc/hosts 192.168.146.181 hadoop-server1 192.168.146.182 hadoop-server2 192.168.146.183

HA高可用集群部署(ricci+luci+fence) 双机热备

主机环境 redhat6.5 6位 实验环境 服务端1 ip172.25.29.1   主机名:server1.example.com   ricci     服务端2 ip172.25.29.2    主机名:server2.example.com    ricci     管理端1 ip172.25.29.3    主机名:server3.example.com    luci     管理端2 ip172.25.29.250     fence_virtd 防火墙状态:关闭 1. 安装ri

Spark 概述及其高可用集群部署

Spark入门 一. 学习目标 目标1:熟悉Spark相关概念 目标2:搭建一个Spark集群 二. Spark概述 2.1什么是Spark(官网:http://spark.apache.org) Spark是一种快速.通用.可扩展的大数据分析引擎,2009年诞生于加州大学伯克利分校AMPLab,2010年开源,2013年6月成为Apache孵化项目,2014年2月成为Apache顶级项目.目前,Spark生态系统已经发展成为一个包含多个子项目的集合,其中包含SparkSQL.Spark Str

HA 高可用集群概述及其原理解析

HA 高可用集群概述及其原理解析 1. 概述 1)所谓HA(High Available),即高可用(7*24小时不中断服务). 2)实现高可用最关键的策略是消除单点故障.HA严格来说应该分成各个组件的HA机制:HDFS 的HA和YARN的HA. 3)Hadoop2.0之前,在HDFS集群中NameNode存在单点故障(SPOF). 4)NameNode主要在以下两个方面影响HDFS集群: ? NameNode机器发生意外,如宕机,集群将无法使用,直到管理员重启 ? NameNode机器需要升级

Heartbeat学习笔记--HA高可用集群实现

一.部署环境: 服务器版本:CentOS6.5 双主热备模式: VIP:192.168.3.30(MASTER上) VIP:192.168.3.32(BACKUP上) 主机网络参数: 接口 MASTER BACKUP 说明 eth1 192.168.3.23 192.168.3.24 内网管理IP eth2 192.168.5.23 192.168.5.24 心跳线 eth3 192.168.2.23 192.168.2.24 外网(临时下载文件用) 网络拓扑: 二.需求分析: 通过Heartb

搭建HA高可用集群

搭建HA高可用集群 一.搭建集群的准备环境 有三台机器,两台上装ricci,另外一台装luci Luci管理机:172.25.47.6 Ricci节点:172.25.47.4   172.25.47.5 Yum仓库: Yum仓库中要指向其他的一些包 注意:yum仓库一般是从Server目录中下载包,但是也有一些其他的包在其他的目录下,因此此次yum 源的配置会和以往不一样 Yum源中配置的是这几个模块: 防火墙方面: 永久关闭防火墙 Selinux方面: 由于这些套件是redhat自带的,所以可

线上测试高可用集群部署文档【我的技术我做主】

线上测试高可用集群部署文档 目录: 目录:1 项目需求:2 实现方式:2 拓扑图:3 系统及软件版本:3 安装步骤:4 IP分配:4 LVS和keepalived的安装和配置:4 LVS主配置:4 LVS2备 配置:7 web服务器配置9 Mysql-MHA高可用:13 Mysql主配置:13 manager管理端操作:15 VIP切换:16 测试:26 下面是centos5.6的系统环境,如果是centos6版本,只需改动少许地方即可,步骤一致 . ---- by 金戈铁马行飞燕 项目需求: