hbase快速入门

hbase 是什么?

 Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google‘s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

hbase的应用场景

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project‘s goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.

hbase的特性

>>>Linear and modular scalability.
>>>Strictly consistent reads and writes.
>>>Automatic and configurable sharding of tables
>>>Automatic failover support between RegionServers.
>>>Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
>>>Easy to use Java API for client access.
>>>Block cache and Bloom Filters for real-time queries.
>>>Query predicate push down via server side Filters
>>>Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
>>>Extensible jruby-based (JIRB) shell
>>>Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

hbase 有哪些操作?

先看一下hbase的基本操作:创建一个表,删除一个表,增加一条记录,删除一条记录,遍历一条记录。

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseTest {

    private static Configuration conf = null;
    /**
     * Initialization
     */
    static {
        conf = HBaseConfiguration.create();
    }

    /**
     * Create a table
     */
    public static void creatTable(String tableName, String[] familys)
            throws Exception {
        HBaseAdmin admin = new HBaseAdmin(conf);
        if (admin.tableExists(tableName)) {
            System.out.println("table already exists!");
        } else {
            HTableDescriptor tableDesc = new HTableDescriptor(tableName);
            for (int i = 0; i < familys.length; i++) {
                tableDesc.addFamily(new HColumnDescriptor(familys[i]));
            }
            admin.createTable(tableDesc);
            System.out.println("create table " + tableName + " ok.");
        }
    }

    /**
     * Delete a table
     */
    public static void deleteTable(String tableName) throws Exception {
        try {
            HBaseAdmin admin = new HBaseAdmin(conf);
            admin.disableTable(tableName);
            admin.deleteTable(tableName);
            System.out.println("delete table " + tableName + " ok.");
        } catch (MasterNotRunningException e) {
            e.printStackTrace();
        } catch (ZooKeeperConnectionException e) {
            e.printStackTrace();
        }
    }

    /**
     * Put (or insert) a row
     */
    public static void addRecord(String tableName, String rowKey,
            String family, String qualifier, String value) throws Exception {
        try {
            HTable table = new HTable(conf, tableName);
            Put put = new Put(Bytes.toBytes(rowKey));
            put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), Bytes
                    .toBytes(value));
            table.put(put);
            System.out.println("insert recored " + rowKey + " to table "
                    + tableName + " ok.");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * Delete a row
     */
    public static void delRecord(String tableName, String rowKey)
            throws IOException {
        HTable table = new HTable(conf, tableName);
        List<Delete> list = new ArrayList<Delete>();
        Delete del = new Delete(rowKey.getBytes());
        list.add(del);
        table.delete(list);
        System.out.println("del recored " + rowKey + " ok.");
    }

    /**
     * Get a row
     */
    public static void getOneRecord (String tableName, String rowKey) throws IOException{
        HTable table = new HTable(conf, tableName);
        Get get = new Get(rowKey.getBytes());
        Result rs = table.get(get);
        for(KeyValue kv : rs.raw()){
            System.out.print(new String(kv.getRow()) + " " );
            System.out.print(new String(kv.getFamily()) + ":" );
            System.out.print(new String(kv.getQualifier()) + " " );
            System.out.print(kv.getTimestamp() + " " );
            System.out.println(new String(kv.getValue()));
        }
    }
    /**
     * Scan (or list) a table
     */
    public static void getAllRecord (String tableName) {
        try{
             HTable table = new HTable(conf, tableName);
             Scan s = new Scan();
             ResultScanner ss = table.getScanner(s);
             for(Result r:ss){
                 for(KeyValue kv : r.raw()){
                    System.out.print(new String(kv.getRow()) + " ");
                    System.out.print(new String(kv.getFamily()) + ":");
                    System.out.print(new String(kv.getQualifier()) + " ");
                    System.out.print(kv.getTimestamp() + " ");
                    System.out.println(new String(kv.getValue()));
                 }
             }
        } catch (IOException e){
            e.printStackTrace();
        }
    }

    public static void main(String[] agrs) {
        try {
            String tablename = "scores";
            String[] familys = { "grade", "course" };
            HBaseTest.creatTable(tablename, familys);

            // add record zkb
            HBaseTest.addRecord(tablename, "zkb", "grade", "", "5");
            HBaseTest.addRecord(tablename, "zkb", "course", "", "90");
            HBaseTest.addRecord(tablename, "zkb", "course", "math", "97");
            HBaseTest.addRecord(tablename, "zkb", "course", "art", "87");
            // add record baoniu
            HBaseTest.addRecord(tablename, "baoniu", "grade", "", "4");
            HBaseTest.addRecord(tablename, "baoniu", "course", "math", "89");

            System.out.println("===========get one record========");
            HBaseTest.getOneRecord(tablename, "zkb");

            System.out.println("===========show all record========");
            HBaseTest.getAllRecord(tablename);

            System.out.println("===========del one record========");
            HBaseTest.delRecord(tablename, "baoniu");
            HBaseTest.getAllRecord(tablename);

            System.out.println("===========show all record========");
            HBaseTest.getAllRecord(tablename);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

基本概念 

HTable核心概念,实现了Table,用来和hbase的一个单表进行通信。轻量级的,提供获取和关闭方法。这个类不能通过构造函数直接构建出来。可以通过Connection获取该类的一个实例。参考ConnectionFactory类生成实例:
 Connection connection = ConnectionFactory.createConnection(config);
 Table table = connection.getTable(TableName.valueOf("table1"));
 try {
   // Use the table as needed, for a single operation and a single thread
 } finally {
    table.close();
    connection.close();
  }

Htable的字段有哪些呢?

public class HTable implements HTableInterface, RegionLocator {
  private static final Log LOG = LogFactory.getLog(HTable.class);
  protected ClusterConnection connection;
  private final TableName tableName;
  private volatile Configuration configuration;
  private TableConfiguration tableConfiguration;
  protected BufferedMutatorImpl mutator;
  private boolean autoFlush = true;
  private boolean closed = false;
  protected int scannerCaching;
  private ExecutorService pool;  // For Multi & Scan
  private int operationTimeout;
  private final boolean cleanupPoolOnClose; // shutdown the pool in close()
  private final boolean cleanupConnectionOnClose; // close the connection in close()
  private Consistency defaultConsistency = Consistency.STRONG;

  /** The Async process for batch */
  protected AsyncProcess multiAp;
  private RpcRetryingCallerFactory rpcCallerFactory;
  private RpcControllerFactory rpcControllerFactory;
}

HTableDescriptor包含了HBase表的详细信息,例如所有列家族的描述,该表是否一个分类表,ROOT或者hbase:meta,该表是否只读,当region分片时memstore的最大值,关联的coprocessor等等。

Htable继承并实现了Table,Table用来和一个hbase单表进行通信,从表中获取,插入,删除或者扫描数据。使用Connection来获取Table实例,使用完毕后调用close()方法。

Htable也继承实现了RegionLocator,RegionLocator用来定位一张Hbase单表的区域位置信息,可以通过Connection获取该类的实例,RegionLocator的getRegionLocation方法返回HRegionLocation。

HRegionLocation

记录HRegionInfo和HRegionServer的主机地址的数据结构。

构造函数:

  public HRegionLocation(HRegionInfo regionInfo, ServerName serverName) {
    this(regionInfo, serverName, HConstants.NO_SEQNUM);
  }

  public HRegionLocation(HRegionInfo regionInfo, ServerName serverName, long seqNum) {
    this.regionInfo = regionInfo;
    this.serverName = serverName;
    this.seqNum = seqNum;
  }
HRegionInfo

  一个区域的信息。区域是在一张表的整个键空间中一系列的键,一个标识(时间戳)区分不同子序列(在区间分隔之后),一个复制ID区分同一序列和同一区域状态信息的不同实例。

  一个区域有一个位于的名称,名称由下列的字段组成:

    表名(tableName):表的名称。

    开始键(startKey):一个区间的开始键。

    区域ID(regionId):创建区域的时间戳。

    复制ID(replicaId):一个区分同一区域序列的Id,从0开始,保存到不同的服务器,同一个区域的序列可以保存在多个位置中。

    加密后的名称(encodedName):md5加密后的区域名称。

除了区域名称外,区域信息还包含:    结束键(endkey):区域的结束键(独有的)    分片(split):区域是否分片    离线(offline):区域是否离线在0.98版本或者之前,一组表的区域会完全包含所有的键空间,在任何时间点,一个行键通常属于一个单独的区域,该单独区域又属于一个单独的服务器。在0.99+版本,一个区域可以有多个实例(叫做备份),因此一行可以对应多个HRegionInfo。这些HRI除了备份Id字段外都可以共用字段。若备份Id未设置,默认为0。
   /**
   * The new format for a region name contains its encodedName at the end.
   * The encoded name also serves as the directory name for the region
   * in the filesystem.
   *
   * New region name format:
   *    &lt;tablename>,,&lt;startkey>,&lt;regionIdTimestamp>.&lt;encodedName>.
   * where,
   *    &lt;encodedName> is a hex version of the MD5 hash of
   *    &lt;tablename>,&lt;startkey>,&lt;regionIdTimestamp>
   *
   * The old region name format:
   *    &lt;tablename>,&lt;startkey>,&lt;regionIdTimestamp>
   * For region names in the old format, the encoded name is a 32-bit
   * JenkinsHash integer value (in its decimal notation, string form).
   *<p>
   * **NOTE**
   *
   * The first hbase:meta region, and regions created by an older
   * version of HBase (0.20 or prior) will continue to use the
   * old region name format.
   */

  /** Separator used to demarcate the encodedName in a region name
   * in the new format. See description on new format above.
   */
  private static final int ENC_SEPARATOR = ‘.‘;
  public  static final int MD5_HEX_LENGTH   = 32;

  /** A non-capture group so that this can be embedded. */
  public static final String ENCODED_REGION_NAME_REGEX = "(?:[a-f0-9]+)";

  // to keep appended int‘s sorted in string format. Only allows 2 bytes to be
  // sorted for replicaId
  public static final String REPLICA_ID_FORMAT = "%04X";

  public static final byte REPLICA_ID_DELIMITER = (byte)‘_‘;

  private static final int MAX_REPLICA_ID = 0xFFFF;
  static final int DEFAULT_REPLICA_ID = 0;

private byte [] endKey = HConstants.EMPTY_BYTE_ARRAY;
  // This flag is in the parent of a split while the parent is still referenced
  // by daughter regions.  We USED to set this flag when we disabled a table
  // but now table state is kept up in zookeeper as of 0.90.0 HBase.
  private boolean offLine = false;
  private long regionId = -1;
  private transient byte [] regionName = HConstants.EMPTY_BYTE_ARRAY;
  private boolean split = false;
  private byte [] startKey = HConstants.EMPTY_BYTE_ARRAY;
  private int hashCode = -1;
  //TODO: Move NO_HASH to HStoreFile which is really the only place it is used.
  public static final String NO_HASH = null;
  private String encodedName = null;
  private byte [] encodedNameAsBytes = null;
  private int replicaId = DEFAULT_REPLICA_ID;

  // Current TableName
  private TableName tableName = null;

  /** HRegionInfo for first meta region */
  public static final HRegionInfo FIRST_META_REGIONINFO =
      new HRegionInfo(1L, TableName.META_TABLE_NAME);
先了解一下HBase的数据结构:

组成部件说明:

Row Key:     Table主键 行键 Table中记录按照Row Key排序
Timestamp:     每次对数据操作对应的时间戳,也即数据的version number
Column Family:  列簇,一个table在水平方向有一个或者多个列簇,列簇可由任意多个Column组成,列簇支持动态扩展,无须预定义数量及类型,二进制存储,用户需自行进行类型转换。

行操作

Get用来对一个单独的行进行Get操作:获取一个row的所有信息前,需要实例化一个Get对象。获取特定家族的所有列,使用addFamily(byte[])。获取特定列,使用addColumn(byte[], byte[])获取在特定的时间戳内的一系列列,使用setTimeRange(long, long)获取在特定时间戳的列,使用setTimeStamp(long)限制返回的列数,使用setMaxVersions(int)增加过滤器,使用setFilter(Filter)。

Put用来对一个单独的列进行Put操作:使用Put前需初始化Put对象,来插入一行使用add(byte[], byte[], byte[]),若设定时间戳则使用add(byte[], byte[], long, byte[])。

Append 操作:对一行增加多个列,使用add(byte[], byte[], byte[]);

参考文献:

【1】http://hbase.apache.org/

【2】https://autofei.wordpress.com/2012/04/02/java-example-code-using-hbase-data-model-operations/

【3】http://www.cnblogs.com/shitouer/archive/2012/06/04/2533518.html

时间: 2024-10-10 17:42:34

hbase快速入门的相关文章

Nutch 快速入门(Nutch 2.2.1+Hbase+Solr)

http://www.tuicool.com/articles/VfEFjm Nutch 2.x 与 Nutch 1.x 相比,剥离出了存储层,放到了gora中,可以使用多种数据库,例如HBase, Cassandra, MySql来存储数据了.Nutch 1.7 则是把数据直接存储在HDFS上. 1. 安装并运行HBase 为了简单起见,使用Standalone模式,参考 HBase Quick start 1.1 下载,解压 wget http://archive.apache.org/di

Gora快速入门

概述 Gora是apache的一个开源项目. The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive

Bosun快速入门

Bosun快速入门 本文档是Bosun的快速安装文档.根据本文档,你可以搭建一个完整的bosun服务,可以把指定机器的各种信息聚合起来,并且实现相关信息的报警. Bosun 这篇文档中,bosun的安装依赖docker.如果不希望使用docker,可以从 bosun.org中自行下载bosun二进制文件,但是这样就需要自己安装OpenTSDB和HBase. Docker 安装Docker 如果系统中没有安装Docker,可以参考此处进行安装 https://docs.docker.com/ins

区块链快速入门(六)——区块链密码学与安全相关技术

区块链快速入门(六)--区块链密码学与安全相关技术 一.区块链密码安全技术简介 区块链和分布式账本中大量使用了密码学和安全技术的最新成果,特别是身份认证和隐私保护相关技术.区块链使用了包括Hash 算法与摘要.加密算法.数字签名和证书.PKI体系.Merkle 树.布隆过滤器.同态加密等密码安全相关技术,用于设计实现区块链的机密性.完整性.可认证性和不可抵赖性. 二.Hash算法与数字摘要 1.Hash算法简介 Hash(哈希或散列)算法,常被称为指纹(fingerprint)或摘要(diges

教你零基础如何快速入门大数据技巧

现在是大数据时代,很多人都想要学习大数据,因为不管是就业前景还是薪资都非常的不错,不少人纷纷从其他行业转型到大数据行业,那么零基础的人也想要学习大数据怎么办呢?下面一起探讨下零基础如何快速入门大数据技巧吧. 很多人都需要学习大数据是需要有一定的基础的,编程语言就是必备的条件之一,编程语言目前热门的有:Java.Python.PHP.C/C++等等,无论是学习哪一门编程语言,总之要精细掌握一门语言是非常必须的,我们先拿应用广泛的Java说起哦. .在入门学习大数据的过程当中有遇见学习,行业,缺乏系

卡夫卡快速入门

起源 Kafka是由Apache软件基金会开发的一个开源流处理平台,由Scala和Java编写.该项目的目标是为处理实时数据提供一个统一.高吞吐.低延迟的平台.其持久化层本质上是一个“按照分布式事务日志架构的大规模发布/订阅消息队列”,这使它作为企业级基础设施来处理流式数据非常有价值.此外,Kafka可以通过Kafka Connect连接到外部系统(用于数据输入/输出),并提供了Kafka Streams——一个Java流式处理库.该设计受事务日志(英语:Transaction log)的影响较

笔记:Spring Cloud Zuul 快速入门

Spring Cloud Zuul 实现了路由规则与实例的维护问题,通过 Spring Cloud Eureka 进行整合,将自身注册为 Eureka 服务治理下的应用,同时从 Eureka 中获取了所有其他微服务的实例信息,这样的设计非常巧妙的将服务治理体系中维护的实例信息利用起来,使得维护服务实例的工作交给了服务治理框架自动完成,而对路由规则的维护,默认会将通过以服务名作为 ContextPath 的方式来创建路由映射,也可以做一些特别的配置,对于签名校验.登录校验等在微服务架构中的冗余问题

javaweb-html快速入门

本文主要是进行HTML简单介绍(详细的属性查帮助文档就行了,这里主要为快速入门,赶时间,在最短的时间中看明白一个html文件的代码(如果能称之为代码的话)详细的样式表,布局啥的有时间再研究吧) HTML 1.html的简介 1.1,html的全称:HyperText Mark-up Language ,超文本标记型语言,是网页的语言. 超文本:比文本更加强大(后面还会讲到XML,可扩展标记性语言) 标记:就是标签,html所有操作都是通过标签直接或间接的操作(把需要操作的数据通过标签封装起来)

crosswalk 快速入门,利用WebRTC(html)开始开发视频通话

crosswalk 快速入门,利用WebRTC(html)开始开发视频通话 安装Python 从http://www.python.org/downloads/ 下载安装程序 安装完后,再添加到环境变量. 安装Oracle JDK 下载页面: http://www.oracle.com/technetwork/java/javase/downloads/ 选择要下载的Java版本(推荐Java 7). 选择一个JDK下载并接受许可协议. 一旦下载,运行安装程序. 安装Ant Ant:下载http