1、HRegionServer启动不正常
在namenode上执行jps,则可看到hbase启动是否正常,进程如下:
[[email protected] bin]# jps
26341 HMaster
26642 Jps
7840 ResourceManager
7524 NameNode
7699 SecondaryNameNode
由上可见,hadoop启动正常。HBase少了一个进程,猜测应该是有个节点regionserver没有启动成功。
进入节点slave1 ,执行jps查看启动进程:
[[email protected] bin]# ssh slave1
Last login: Thu Jul 17 17:29:11 2014 from master
[[email protected] ~]# jps
4296 DataNode
11261 HRegionServer
11512 Jps
11184 QuorumPeerMain
由此可见Slave1节点正常。
进入节点slave2节点,执行jps查看启动进程:
[[email protected] ~]# jps
3795 DataNode
11339 Jps
11080 QuorumPeerMain
OK,问题找到了 HRegionServer没有启动成功。进入HBase日志:
2014-07-17 09:28:19,392 INFO [regionserver60020] regionserver.HRegionServer: STOPPED: Unhandled: org.apache.hadoop.hbase.ClockOutOfSyncException: Server slave2,60020,1405560498057 has been rejected; Reported time is too far out of sync with master. Time difference of 28804194ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1292)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2185)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1889)
根据错误日志,可得到slave2和maste机器时间差太多,查看各个系统的时间,果真如此,同步即可。另外一种方法就是配置hbase的配置文件:
配置:hbase.master.maxclockske
<property>
<name>hbase.master.maxclockskew</name>
<value>200000</value>
<description>Time difference of regionserver from master</description>
</property>
(这种方法不推荐)
2、Zookeeper启动不正常。
在启动hbase时,总是报错,提示zookeeper连接不上,查看zookeeper日志,发现:
[email protected]] - Opening socket connection to server slave1. Will not attempt to authenticate using SASL (无法定位登录配置)。经过百度可得
由于hosts文件的问题,于是vi /etc/hosts 发现 ip slave1配置中ip错误。汗!幸亏hbase和zookeeper都有日志。于是重启zookeeper和hbase,上述问题解决。
HBase集群安装过程中的问题集锦