hbase region split操作的一些细节,具体split步骤很多文档都有说明,本文主要关注regionserver如何选取split point
首先推荐web ui查看hbase region分布的一个开源工具hannibal,建议用daemontool管理hannibal意外退出,自动重启,之前博文写了博文介绍如何使用daemontool管理
假设有一张hbase的table如下表所示,有一个region的大小比较大,可以对这个region进行手动split操作
HBase的物理存储树状图如下
Table (HBase table) Region (Regions for the table) Store (Store per ColumnFamily for each Region for the table) MemStore (MemStore for each Store for each Region for the table) StoreFile (StoreFiles for each Store for each Region for the table) Block (Blocks within a StoreFile within a Store for each Region for the table)
一种常见的分裂策略是:ConstantSizeRegionSplitPolicy,配置hbase.hregion.max.filesize是指某个store(对应一个column family)的大小
/<hdfs-dir>/<hbasetable>/<xxx(part of region-id)>/<columu-family>
memstore flush到store files时,或者多个store files compact操作时候,会判断是否需要split。
找到最大且不包含reference的store,在这个store下面找到最大的storefile,然后用这个storefile的中间rowkey作为split的点。
RegionSplitPolicy.java Iterator i$ = stores.values().iterator(); while(i$.hasNext()) { Store s = (Store)i$.next(); byte[] splitPoint = s.getSplitPoint(); long storeSize = s.getSize(); if(splitPoint != null && largestStoreSize < storeSize) { splitPointFromLargestStore = splitPoint; largestStoreSize = storeSize; } }
Store.java public byte[] getSplitPoint() { long e = 0L; StoreFile largestSf = null; Iterator r = this.storefiles.iterator(); StoreFile midkey; while (r.hasNext()) { midkey = (StoreFile) r.next(); org.apache.hadoop.hbase.regionserver.StoreFile.Reader mk; if (midkey.isReference()) { assert false : "getSplitPoint() called on a region that can\‘t split!"; mk = null; return (byte[]) mk; } mk = midkey.getReader(); if (mk == null) { LOG.warn("Storefile " + midkey + " Reader is null"); } else { long fk = mk.length(); if (fk > e) { e = fk; largestSf = midkey; } } } org.apache.hadoop.hbase.regionserver.StoreFile.Reader r1 = largestSf.getReader(); if (r1 == null) { LOG.warn("Storefile " + largestSf + " Reader is null"); midkey = null; return (byte[]) midkey; } byte[] midkey1 = r1.midkey(); //...略 }
所以split实际上并不是完全的等分,因为split point不一定是数据分布的中位点。
参考:
http://blog.javachen.com/2014/01/16/hbase-region-split-policy.html
http://www.cnblogs.com/niurougan/articles/3975463.html
http://hbase.group.iteye.com/group/topic/40359