Hbase设置多个hmaster

为了保证HBase集群的高可靠性,HBase支持多Backup Master 设置。当Active Master挂掉后,Backup Master可以自动接管整个HBase的集群。

该配置极其简单:

在$HBASE_HOME/conf/ 目录下新增文件配置backup-masters,在其内添加要用做Backup Master的节点hostname。如下:

[[email protected] conf]$ cat backup-masters
node1

之后,启动整个集群,我们会发现,在master和node1上,都启动了HMaster进程:

[[email protected] conf]$ jps
25188 NameNode
3319 QuorumPeerMain
31725 Jps
25595 ResourceManager
31077 HMaster
25711 NodeManager
25303 DataNode
31617 Main
31220 HRegionServer
[[email protected] root]$ jps
11560 DataNode
11762 NodeManager
20769 Jps
415 QuorumPeerMain
11675 SecondaryNameNode
20394 HRegionServer
20507 HMaster

此时查看node1上master节点的log,可以看到如下的信息:

[[email protected] logs]$ tail -f hbase-hbase-master-node1.log
2015-10-10 05:35:09,609 INFO  [main] mortbay.log: Started [email protected]0.0.0.0:60010
2015-10-10 05:35:09,613 INFO  [main] master.HMaster: hbase.rootdir=hdfs://master:9000/hbase, hbase.cluster.distributed=true
2015-10-10 05:35:09,631 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/node1,60000,1444455307700
2015-10-10 05:35:09,806 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, master,60000,1444455305852; waiting to become the next active master
2015-10-10 05:35:09,858 INFO  [master/node1/10.0.52.145:60000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x10135dbc connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
2015-10-10 05:35:09,858 INFO  [master/node1/10.0.52.145:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x10135dbc0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
2015-10-10 05:35:09,859 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Opening socket connection to server node2/10.0.52.146:2181. Will not attempt to authenticate using SASL (unknown error)
2015-10-10 05:35:09,860 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Socket connection established to node2/10.0.52.146:2181, initiating session
2015-10-10 05:35:09,885 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Session establishment complete on server node2/10.0.52.146:2181, sessionid = 0x350463058c10017, negotiated timeout = 40000
2015-10-10 05:35:09,920 INFO  [master/node1/10.0.52.145:60000] regionserver.HRegionServer: ClusterId : c309a039-eb35-400c-bb13-0b6ed939cc5e

该信息说明,当前hbase集群有活动的master节点,该master节点为master,所以node1节点开始等待,直到master节点上的hmaster挂掉。slave1会变成新的Active 的 Master节点。

此时,直接kill掉master节点上HMaster进程,查看node1上master节点log会发现:

2015-10-10 05:42:17,173 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/node1,60000,1444455307700 from backup master directory
2015-10-10 05:42:17,194 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=node1,60000,1444455307700
2015-10-10 05:42:17,758 INFO  [node1:60000.activeMasterManager] fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks
2015-10-10 05:42:17,776 INFO  [node1:60000.activeMasterManager] coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes
2015-10-10 05:42:17,880 INFO  [node1:60000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x29d405f7 connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
2015-10-10 05:42:17,880 INFO  [node1:60000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x29d405f70x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
2015-10-10 05:42:17,883 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Opening socket connection to server node2/10.0.52.146:2181. Will not attempt to authenticate using SASL (unknown error)
2015-10-10 05:42:17,884 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Socket connection established to node2/10.0.52.146:2181, initiating session
2015-10-10 05:42:17,904 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Session establishment complete on server node2/10.0.52.146:2181, sessionid = 0x350463058c1001b, negotiated timeout = 40000
2015-10-10 05:42:17,942 INFO  [node1:60000.activeMasterManager] balancer.StochasticLoadBalancer: loading config
2015-10-10 05:42:18,061 INFO  [node1:60000.activeMasterManager] master.HMaster: Server active/primary master=node1,60000,1444455307700, sessionid=0x150463058ac001a, setting cluster-up flag (Was=true)
2015-10-10 05:42:18,154 INFO  [node1:60000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort
2015-10-10 05:42:18,184 INFO  [node1:60000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/flush-table-proc/acquired /hbase/flush-table-proc/reached /hbase/flush-table-proc/abort
2015-10-10 05:42:18,256 INFO  [node1:60000.activeMasterManager] master.MasterCoprocessorHost: System coprocessor loading is enabled
2015-10-10 05:42:18,286 INFO  [node1:60000.activeMasterManager] procedure2.ProcedureExecutor: Starting procedure executor threads=5
2015-10-10 05:42:18,288 INFO  [node1:60000.activeMasterManager] wal.WALProcedureStore: Starting WAL Procedure Store lease recovery
2015-10-10 05:42:18,296 INFO  [node1:60000.activeMasterManager] util.FSHDFSUtils: Recovering lease on dfs file hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log
2015-10-10 05:42:18,307 INFO  [node1:60000.activeMasterManager] util.FSHDFSUtils: recoverLease=true, attempt=0 on file=hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log after 9ms
2015-10-10 05:42:18,324 WARN  [node1:60000.activeMasterManager] wal.WALProcedureStore: Unable to read tracker for hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log - Missing trailer: size=9 startPos=9
2015-10-10 05:42:18,373 INFO  [node1:60000.activeMasterManager] wal.WALProcedureStore: Lease acquired for flushLogId: 28
2015-10-10 05:42:18,383 WARN  [node1:60000.activeMasterManager] wal.ProcedureWALFormatReader: nothing left to decode. exiting with missing EOF
2015-10-10 05:42:18,383 INFO  [node1:60000.activeMasterManager] wal.ProcedureWALFormatReader: No active entry found in state log hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log. removing it
2015-10-10 05:42:18,405 INFO  [node1:60000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
2015-10-10 05:42:18,405 INFO  [node1:60000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=replicationLogCleaner0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
2015-10-10 05:42:18,407 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Opening socket connection to server node1/10.0.52.145:2181. Will not attempt to authenticate using SASL (unknown error)
2015-10-10 05:42:18,408 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Socket connection established to node1/10.0.52.145:2181, initiating session
2015-10-10 05:42:18,426 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Session establishment complete on server node1/10.0.52.145:2181, sessionid = 0x250463058780018, negotiated timeout = 40000
2015-10-10 05:42:18,464 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-10-10 05:42:19,970 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1506 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-10-10 05:42:21,475 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3011 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-10-10 05:42:22,980 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4516 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-10-10 05:42:23,058 INFO  [PriorityRpcServer.handler=3,queue=1,port=60000] master.ServerManager: Registering server=node1,16020,1444455306545
2015-10-10 05:42:23,059 INFO  [PriorityRpcServer.handler=5,queue=1,port=60000] master.ServerManager: Registering server=master,16020,1444455306763
2015-10-10 05:42:23,060 INFO  [PriorityRpcServer.handler=1,queue=1,port=60000] master.ServerManager: Registering server=node2,16020,1444455305886
2015-10-10 05:42:23,081 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 4617 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-10-10 05:42:24,586 INFO  [node1:60000.activeMasterManager] master.ServerManager: Finished waiting for region servers count to settle; checked in 3, slept for 6122 ms, expecting minimum of 1, maximum of 2147483647, master is running
2015-10-10 05:42:24,610 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/master,16020,1444455306763 belongs to an existing region server
2015-10-10 05:42:24,619 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/node1,16020,1444455306545 belongs to an existing region server
2015-10-10 05:42:24,625 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/node2,16020,1444455305886 belongs to an existing region server
2015-10-10 05:42:24,757 INFO  [node1:60000.activeMasterManager] master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1444455744651, server=null} to {1588230740 state=OPEN, ts=1444455744756, server=node2,16020,1444455305886}
2015-10-10 05:42:24,757 INFO  [node1:60000.activeMasterManager] master.ServerManager: AssignmentManager hasn‘t finished failover cleanup; waiting
2015-10-10 05:42:24,760 INFO  [node1:60000.activeMasterManager] master.HMaster: hbase:meta with replicaId 0 assigned=0, rit=false, location=node2,16020,1444455305886
2015-10-10 05:42:24,895 INFO  [node1:60000.activeMasterManager] hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization
2015-10-10 05:42:24,985 INFO  [node1:60000.activeMasterManager] master.AssignmentManager: Found regions out on cluster or in RIT; presuming failover
2015-10-10 05:42:25,000 INFO  [node1:60000.activeMasterManager] master.AssignmentManager: Joined the cluster in 104ms, failover=true
2015-10-10 05:42:25,216 INFO  [node1:60000.activeMasterManager] master.HMaster: Master has completed initialization
2015-10-10 05:42:25,234 INFO  [node1:60000.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled

可见,node1节点上Backup Master 已经结果HMaster,成为Active HMaster

重新启动master节点上的hmaster

[[email protected] bin]$ ./hbase-daemon.sh start master
starting master, logging to /usr/local/hbase//logs/hbase-hbase-master-master.out
[[email protected] bin]$ jps
25188 NameNode
32351 Jps
3319 QuorumPeerMain
32265 HMaster
25595 ResourceManager
25711 NodeManager
25303 DataNode
31220 HRegionServer

查看master节点的log发现,它变为了backup master

[[email protected] logs]$ tail -f  hbase-hbase-master-master.log
2015-10-10 05:53:15,329 INFO  [main] mortbay.log: Started [email protected]0.0.0.0:60010
2015-10-10 05:53:15,333 INFO  [main] master.HMaster: hbase.rootdir=hdfs://master:9000/hbase, hbase.cluster.distributed=true
2015-10-10 05:53:15,348 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/master,60000,1444456393819
2015-10-10 05:53:15,488 INFO  [master:60000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, node1,60000,1444455307700; waiting to become the next active master
2015-10-10 05:53:15,522 INFO  [master/master/10.0.52.144:60000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x323b7deb connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:2181
2015-10-10 05:53:15,522 INFO  [master/master/10.0.52.144:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x323b7deb0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase
2015-10-10 05:53:15,524 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Opening socket connection to server master/10.0.52.144:2181. Will not attempt to authenticate using SASL (unknown error)
2015-10-10 05:53:15,525 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Socket connection established to master/10.0.52.144:2181, initiating session
2015-10-10 05:53:15,536 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Session establishment complete on server master/10.0.52.144:2181, sessionid = 0x150463058ac001c, negotiated timeout = 40000
2015-10-10 05:53:15,567 INFO  [master/master/10.0.52.144:60000] regionserver.HRegionServer: ClusterId : c309a039-eb35-400c-bb13-0b6ed939cc5e
时间: 2024-09-30 17:36:12

Hbase设置多个hmaster的相关文章

全网最详细的HBase启动以后,HMaster进程启动了,几秒钟以后自动关闭问题的解决办法(图文详解)

不多说,直接上干货! 问题详情 情况描述如题所示,hbase启动以后,HMaster进程启动了,几秒钟以后自动关闭,但是HRegionServer进程正常运行: 解决办法: 1.检查下每台机器的时间是否同步: 2.检查下每台机器的防火墙是否关闭: 3.查看HMaster的日志路径 但是不久后HMaster会自动关闭,hbase启动失败啊!!! 查看日志 我们这样解决,我们进入zookeeper里面,删除hbase的数据 再次启动hbase HMaster不会挂掉.即成功解决. 欢迎大家,加入我的

HBase 启动后HMaster进程自动消失

原因分析 1.hadoop 与 hbase 版本不兼容,导致的异常. 2.log日志信息显示 org.apache.hadoop.hbase.TableExistsException: hbase:namespace 异常,可能是更换了hbase的版本后zookeeper还保留着上一次的hbase设置,造成了冲突. 解决方案 1.进入zookeeper的bin目录: 2.执行客户端脚本:$sh zkCli.sh 3.查看Zookeeper节点信息: ls / 4.递归删除hbase节点:rmr

HBase介绍及简易安装(转)

HBase介绍及简易安装(转) HBase简介 HBase是Apache Hadoop的数据库,能够对大型数据提供随机.实时的读写访问,是Google的BigTable的开源实现.HBase的目标是存储并处理大型的数据,更具体地说仅用普通的硬件配置,能够处理成千上万的行和列所组成的大型数据库.HBase是一个开源的.分布式的.多版本的.面向列的 存储模型.可以直接使用本地文件系统也可使用Hadoop的HDFS文件存储系统.为了提高数据的可靠性和系统的健壮性,并且发挥HBase处理大型数据 的能力

hbase 学习(十六)系统架构图

转自:http://www.cnblogs.com/cenyuhai/p/3708135.html HBase 系统架构图 组成部件说明  Client:  使用HBase RPC机制与HMaster和HRegionServer进行通信  Client与HMaster进行通信进行管理类操作  Client与HRegionServer进行数据读写类操作  Zookeeper:  Zookeeper Quorum存储-ROOT-表地址.HMaster地址  HRegionServer把自己以Ephe

Hadoop上配置Hbase数据库

已有环境: 1. Ubuntu:14.04.2 2.jdk: 1.8.0_45 3.hadoop:2.6.0 4.hBase:1.0.0 详细过程: 1.下载最新的Hbase,这里我下载的是hbase-1.0.0版本,然后打开终端,输入: tar zxvf hbase-1.0.0.tar.gz解压,然后将hbase放到合适的路径下(可以是用户目录,也可以是根目录,不太清楚是否必须要与hadoop放在用一个根目录下,本人是放在同一个目录下的) 2.修改2个配置文件(这里是伪分布式,单机版不再叙述)

HBase入门篇

目录: 1-HBase的安装 2-Java操作HBase例子 3-HBase简单的优化技巧 4–存储 5(集群) -压力分载与失效转发 6 -白话MySQL(RDBMS)与HBase之间 7 -安全&权限 1-HBase的安装 HBase是什么? HBase是Apache Hadoop中的一个子项目,Hbase依托于Hadoop的HDFS作为最基本存储基础单元,通过使用hadoop的DFS工具就可以看到这些这些数据 存储文件夹的结构,还可以通过Map/Reduce的框架(算法)对HBase进行操

工作中我自己总结的hbase文档,供初学者学习。看了这个,就不用去查什么文档了。

HBase配置和使用文档 HBase配置和使用文档...................................................................................................... 1 一. HBase原理和结构说明............................................................................................. 2 二. HBase的

HBase伪分布搭建

HBase的伪分布搭建 安装版本: hbase-0.94.7-security.tar.gz 下载路径: http://hbase.apache.org 1. 解压缩,重命名,设置环境变量 1 cp hbase-0.94.7-security.tar.gz /opt/ 2 3 tar -zxvf hbase-0.94.7-security.tar.gz 4 5 mv hbase-0.94.7-security.tar.gz hbase #设置环境变量 1 vi /etc/profile 2 ex

hbase官方文档(转)

Apache HBase™ 参考指南  HBase 官方文档中文版 Copyright © 2012 Apache Software Foundation.保留所有权利. Apache Hadoop, Hadoop, MapReduce, HDFS, Zookeeper, HBase 及 HBase项目 logo 是Apache Software Foundation的商标. Revision History Revision 0.95-SNAPSHOT 2012-12-03T13:38 中文版