The Apache HBase? Reference Guide

以下内容由http://hbase.apache.org/book.html#getting_started节选并改编而来。

运行环境:hadoop-1.0.4,hbase-0.94.22,jdk1.7.0_65

Chapter 1. Getting Started

create a table in HBase using the hbase shell CLI,

insert rows into the table,

perform put and scan operations against the table,

enable or disable the table,

start and stop HBase

Local Filesystem and Durability 

Using HBase 0.98.2 and earlier releases with a local filesystem does not guarantee durability. The HDFS local filesystem implementation will lose edits if files are not properly closed. This is very likely to happen when you are experimenting with new software, starting and stopping the daemons often and not always cleanly. You need to run HBase on HDFS to ensure all writes are preserved. 尽管有bug,我们依然这样做的目的是快速方便地熟悉HBase的相关知识。

Loopback IP - HBase 0.94.x and earlier

Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you . See Why does HBase care about /etc/hosts? for detail.

The following /etc/hosts file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble.

127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
        

HBase requires that a JDK be installed. See Table 2.1, “Java” for information about supported JDK versions.

1.2.2. Get Started with HBase

  1. For HBase 0.98.5 and later, you are required to set the JAVA_HOME environment variable before starting HBase. Prior to 0.98.5, HBase attempted to detect the location of Java if the variables was not set. You can set the variable via your operating system‘s usual mechanism, but HBase provides a central mechanism,conf/hbase-env.sh. Edit this file, uncomment the line starting with JAVA_HOME, and set it to the appropriate location for your operating system. The JAVA_HOME variable should be set to a directory which contains the executable file bin/java. Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java. In this case, you can set JAVA_HOME to the directory containing the symbolic link to bin/java, which is usually /usr.

    JAVA_HOME=/usr
  2. Edit main HBase configuration file conf/hbase-site.xml. At this time, only need to specify the directory on the local filesystem where HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of /tmp upon reboot, so you should store the data elsewhere. The following configuration will store HBase‘s data in the hbase directory, in the home directory of the user called testuser. Paste the<property> tags beneath the <configuration> tags, which should be empty in a new HBase install.
  3.  Example hbase-site.xml for Standalone HBase
    <configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>file:///home/testuser/hbase</value>                              //hbase是用户为存储数据而建立的一个目录
      </property>
      <property>                                                                //我还没有安装zookeeper,所以没有这部分的配置信息
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/home/testuser/zookeeper</value>
      </property>
    </configuration>
      
  4. The bin/start-hbase.sh script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jps command to verify that you have one running process called HMaster. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon.

Procedure 1.2. Use HBase For the First Time

  1. Connect to HBase.

    $ ./bin/hbase shell
    hbase(main):001:0> 
  2. Display HBase Shell Help Text.

    Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands.

  3. Create a table.

    You must specify the table name and the ColumnFamily name.

    hbase> create ‘test‘, ‘cf‘
    0 row(s) in 1.2200 seconds        
  4. List Information About your Table
    hbase> list ‘test‘
    TABLE
    test
    1 row(s) in 0.0350 seconds
    
    => ["test"]
  5. Put data into your table.
    hbase> put ‘test‘, ‘row1‘, ‘cf:a‘, ‘value1‘
    0 row(s) in 0.1770 seconds
    
    hbase> put ‘test‘, ‘row2‘, ‘cf:b‘, ‘value2‘
    0 row(s) in 0.0160 seconds
    
    hbase> put ‘test‘, ‘row3‘, ‘cf:c‘, ‘value3‘
    0 row(s) in 0.0260 seconds          
  6. Scan the table for all data at once.
    hbase> scan ‘test‘
    ROW                   COLUMN+CELL
     row1                 column=cf:a, timestamp=1403759475114, value=value1
     row2                 column=cf:b, timestamp=1403759492807, value=value2
     row3                 column=cf:c, timestamp=1403759503155, value=value3
    3 row(s) in 0.0440 seconds          
  7. Get a single row of data.
    hbase> get ‘test‘, ‘row1‘
    COLUMN                CELL
     cf:a                 timestamp=1403759475114, value=value1
    1 row(s) in 0.0230 seconds                
  8. Disable a table.

    If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable command. You can re-enable it using the enable command.

    hbase> disable ‘test‘
    0 row(s) in 1.6270 seconds
    
    hbase> enable ‘test‘
    0 row(s) in 0.4500 seconds
    hbase> disable ‘test‘
    0 row(s) in 1.6270 seconds
              
  9. Drop the table.
    hbase> drop ‘test‘
    0 row(s) in 0.2900 seconds
              
  10. Exit the HBase Shell.

    To exit the HBase Shell and disconnect from your cluster, use the quit command.

  11.  Stop HBase
  1. $ ./bin/stop-hbase.sh
    stopping hbase....................
    $

1.2.3. Intermediate - Pseudo-Distributed Local Install

You can re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the hbase.rootdir property as described in Section 1.2, “Quick Start - Standalone HBase”, your data is still stored in /tmp/. In this walk-through, we store your data in HDFS instead, assuming you have HDFS available.

我启动hbase后只有HMaster而无HRegionServer,为什么????  我需要换乘Hadoop2

Hadoop Configuration

This procedure assumes that you have configured Hadoop and HDFS on your local system and or a remote system, and that they are running and available. It also assumes you are using Hadoop 2. Currently, the documentation on the Hadoop website does not include a quick start for Hadoop 2, but the guide athttp://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide is a good starting point.

  1. Stop HBase if it is running.

    If you have just finished Section 1.2, “Quick Start - Standalone HBase” and HBase is still running, stop it. This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.

  2. Configure HBase.

    Edit the hbase-site.xml configuration. First, add the following property. which directs HBase to run in distributed mode, with one JVM instance per daemon.

    <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
    </property>
                

    Next, change the hbase.rootdir from the local filesystem to the address of your HDFS instance, using the hdfs://// URI syntax. In this example, HDFS is running on the localhost at port 8020.

    <property>
      <name>hbase.rootdir</name>
      <value>hdfs://localhost:8020/hbase</value>
    </property>
    

    You do not need to create the directory in HDFS. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.

  3. Start HBase.

    Use the bin/start-hbase.sh command to start HBase. If your system is configured correctly, the jps command should show the HMaster and HRegionServer processes running.

  4. Check the HBase directory in HDFS.

    If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in /hbase/ on HDFS. You can use the hadoop fscommand in Hadoop‘s bin/ directory to list this directory.

    $ ./bin/hadoop fs -ls /hbase
    Found 7 items
    drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/.tmp
    drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/WALs
    drwxr-xr-x   - hbase users          0 2014-06-25 18:48 /hbase/corrupt
    drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/data
    -rw-r--r--   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id
    -rw-r--r--   3 hbase users          7 2014-06-25 18:41 /hbase/hbase.version
    drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/oldWALs
              
  5. Create a table and populate it with data.

    You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in Procedure 1.2, “Use HBase For the First Time”.

  6. Start and stop a backup HBase Master (HMaster) server.

    Note

    Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only.

    The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use the local-master-backup.sh. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.

    $ ./bin/local-master-backup.sh 2 3 5
                

    To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like /tmp/hbase-USER-X-master.pid. The only contents of the file are the PID. You can use the kill -9 command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:

    $ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
              
  7. Start and stop additional RegionServers

    The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. The local-regionservers.sh command allows you to run multiple RegionServers. It works in a similar way to the local-master-backup.sh command, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead. You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server. The following command starts four additional RegionServers, running on sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).

    $ .bin/local-regionservers.sh start 2 3 4 5
              

    To stop a RegionServer manually, use the local-regionservers.sh command with the stop parameter and the offset of the server to stop.

    $ .bin/local-regionservers.sh stop 3
  8. Stop HBase.

    You can stop HBase the same way as in the Section 1.2, “Quick Start - Standalone HBase” procedure, using the bin/stop-hbase.sh command.

1.2.4. Advanced - Fully Distributed

In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon. These include primary and backup Master instances, multiple Zookeeper nodes, and multiple RegionServer nodes.

This advanced quickstart adds two more nodes to your cluster. The architecture will be as follows:

Table 1.1. Distributed Cluster Demo Architecture

Node Name Master ZooKeeper RegionServer
node-a.example.com yes yes no
node-b.example.com backup yes yes
node-c.example.com no yes yes

This quickstart assumes that each node is a virtual machine and that they are all on the same network. It builds upon the previous quickstart, Section 1.2.3, “Intermediate - Pseudo-Distributed Local Install”, assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a before continuing.

Note

Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors like no route to host, check your firewall.

Procedure 1.4. Configure Password-Less SSH Access

node-a needs to be able to log into node-b and node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from node-a to each of the others.

  1. On node-a, generate a key pair.

    While logged in as the user who will run HBase, generate a SSH key pair, using the following command:

    $ ssh-keygen -t rsa

    If the command succeeds, the location of the key pair is printed to standard output. The default name of the public key is id_rsa.pub.

  2. Create the directory that will hold the shared keys on the other nodes.

    On node-b and node-c, log in as the HBase user and create a .ssh/ directory in the user‘s home directory, if it does not already exist. If it already exists, be aware that it may already contain other keys.

  3. Copy the public key to the other nodes.

    Securely copy the public key from node-a to each of the nodes, by using the scp or some other secure means. On each of the other nodes, create a new file called.ssh/authorized_keys if it does not already exist, and append the contents of the id_rsa.pub file to the end of it. Note that you also need to do this for node-aitself.

    $ cat id_rsa.pub >> ~/.ssh/authorized_keys
  4. Test password-less login.

    If you performed the procedure correctly, if you SSH from node-a to either of the other nodes, using the same username, you should not be prompted for a password.

  5. Since node-b will run a backup Master, repeat the procedure above, substituting node-b everywhere you see node-a. Be sure not to overwrite your existing.ssh/authorized_keys files, but concatenate the new key onto the existing file using the >> operator rather than the > operator.

Procedure 1.5. Prepare node-a

node-a will run your primary master and ZooKeeper processes, but no RegionServers.

  1. Stop the RegionServer from starting on node-a.

    Edit conf/regionservers and remove the line which contains localhost. Add lines with the hostnames or IP addresses for node-b and node-c. Even if you did want to run a RegionServer on node-a, you should refer to it by the hostname the other servers would use to communicate with it. In this case, that would be node-a.example.com. This enables you to distribute the configuration to each node of your cluster any hostname conflicts. Save the file.

  2. Configure HBase to use node-b as a backup master.

    Create a new file in conf/ called backup-masters, and add a new line to it with the hostname for node-b. In this demonstration, the hostname is node-b.example.com.

  3. Configure ZooKeeper

    In reality, you should carefully consider your ZooKeeper configuration. You can find out more about configuring ZooKeeper in Chapter 20, ZooKeeper. This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.

    On node-a, edit conf/hbase-site.xml and add the following properties.

    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
    </property>
    <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/usr/local/zookeeper</value>
    </property>
                
  4. Everywhere in your configuration that you have referred to node-a as localhost, change the reference to point to the hostname that the other nodes will use to refer to node-a. In these examples, the hostname is node-a.example.com.

Procedure 1.6. Prepare node-b and node-c

node-b will run a backup master server and a ZooKeeper instance.

  1. Download and unpack HBase.

    Download and unpack HBase to node-b, just as you did for the standalone and pseudo-distributed quickstarts.

  2. Copy the configuration files from node-a to node-b.and node-c.

    Each node of your cluster needs to have the same configuration information. Copy the contents of the conf/ directory to the conf/ directory on node-b and node-c.

Procedure 1.7. Start and Test Your Cluster

  1. Be sure HBase is not running on any node.

    If you forgot to stop HBase from previous testing, you will have errors. Check to see whether HBase is running on any of your nodes by using the jps command. Look for the processes HMasterHRegionServer, and HQuorumPeer. If they exist, kill them.

  2. Start the cluster.

    On node-a, issue the start-hbase.sh command. Your output will be similar to that below.

    $ bin/start-hbase.sh
    node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
    node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
    node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
    starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
    node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
    node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
    node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
              

    ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.

  3. Verify that the processes are running.

    On each node of the cluster, run the jps command and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes.

    Example 1.3. node-a jps Output

    $ jps
    20355 Jps
    20071 HQuorumPeer
    20137 HMaster
                

    Example 1.4. node-b jps Output

    $ jps
    15930 HRegionServer
    16194 Jps
    15838 HQuorumPeer
    16010 HMaster
                

    Example 1.5. node-c jps Output

    $ jps
    13901 Jps
    13639 HQuorumPeer
    13737 HRegionServer
                

    ZooKeeper Process Name

    The HQuorumPeer process is a ZooKeeper instance which is controlled and started by HBase. If you use ZooKeeper this way, it is limited to one instance per cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of HBase, the process is called QuorumPeer. For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see Chapter 20, ZooKeeper.

  4. Browse to the Web UI.

    Web UI Port Changes

    In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030 for the RegionServer.

    If everything is set up correctly, you should be able to connect to the UI for the Master http://node-a.example.com:60110/ or the secondary master at http://node-b.example.com:60110/ for the secondary master, using a web browser. If you can connect via localhost but not from another host, check your firewall rules. You can see the web UI for each of the RegionServers at port 60130 of their IP addresses, or by clicking their links in the web UI for the Master.

  5. Test what happens when nodes or services disappear.

    With a three-node cluster like you have configured, things will not be very resilient. Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.

1.2.5. Where to go next

The next chapter, Chapter 2, Apache HBase Configuration, gives more information about the different HBase run modes, system requirements for running HBase, and critical configuration areas for setting up a distributed HBase cluster.

Chapter 2. Apache HBase Configuration

时间: 2024-11-01 22:57:21

The Apache HBase? Reference Guide的相关文章

Java Secure Socket Extension (JSSE) Reference Guide

Skip to Content Oracle Technology Network Software Downloads Documentation Search Java Secure Socket Extension (JSSE) Reference Guide This guide covers the following topics: Skip Navigation Links Introduction Features and Benefits JSSE Standard API S

How-to: Enable User Authentication and Authorization in Apache HBase

With the default Apache HBase configuration, everyone is allowed to read from and write to all tables available in the system. For many enterprise setups, this kind of policy is unacceptable. Administrators can set up firewalls that decide which mach

[ 译]Apache HBase Write Path

翻译自cloudera,原文直通车:Apache HBase Write Path Apache HBase也就是Hadoop Database是基于HDFS之上的.HBase可以随机获取和更新存储在HDFS上的记录.但是HDFS 上的文件只能追加而且一旦创建便无法修改.说到这里你或许会问:那HBase是怎么做到在HDFS上低延迟的读和写呢?在这篇 文章里,我们就会通过描述HBase的写的流程来解释数据在HBase中是如何更新的. 写流程描述就是HBase如何完成put或者delete操作.流程

Apache HBase 集群安装文档

简介: Apache HBase 是一个分布式的.面向列的开源 NoSQL 数据库.具有高性能.高可靠性.可伸缩.面向列.分布式存储的特性. HBase 的数据文件最终落地在 HDFS 之上,所以在 Hadoop 集群中,DataNode 节点都需安装 HBase Worker Node. 另外,HBase 受 ZooKeeper 管理,还需安装 ZooKeeper 单机或集群.建议 HBase Master 节点不要与集群中其余 Master 节点安装在同一台物理服务器. HBase Mast

Tephra Apache HBase

Tephra 在 Apache HBase 的基础上提供了全局一致性的事务支持(腾云科技ty300.com).HBase (入门教程qkxue.net)提供了强一致性的基于行和区域的 ACID 操作支持,但是牺牲了在跨区域操作的支持.这就要求应用开发者花很大力气来确保区域边界上操作的一致性.而 Tephra 提供了全局事务支持,可以夸区域.跨表以及多个 RPC 上简化了应用的开发. 示例代码: /** * A Transactional SecondaryIndexTable. */ publi

【Hadoop学习】Apache HBase项目简介

原创声明:转载请注明作者和原始链接 http://www.cnblogs.com/zhangningbo/p/4068957.html       英文原版:http://hbase.apache.org/ Apache HBaseTM ,即Hadoop 数据库,是一个分布式的.可缩放的大数据存储方案. 何时使用Apache HBase? 当需要随机.实时读写大数据时,就可以使用Apache HBase了.该项目旨在组织甚大规模的位于商业硬件集群之上的表——数十亿行 × 数百万列.Apache

apache hbase 发布1.0.0版本

今天apache发布了最新的hbase 1.0.0,下图是版本变迁历史: 详情参考: https://blogs.apache.org/hbase/entry/start_of_a_new_era

hbase definitive guide 笔记

ext3 file system 优化 ext3 在用在hbase上可以做如下优化: 1. mount的时候加上noatime选项.这可以减少管理开销 2. 用命令tune2fs -m 0 /dev/sda1 这样的命令去调整磁盘block 设置.默认ext3会在每一个block中预留一部分空间,这部分空间的目的是,一旦磁盘满了,那么一些critical进程比如OS服务,可以利用这部分空间,而不至于崩溃.这也是有时候我们发现/目录 100%但是OS仍然能运行的原因.这个设置对于根目录这种运行操作

响应式编程库Reactor 3 Reference Guide参考文档中文版(v3.2.0)

Project Reactor 是 Spring WebFlux 的御用响应式编程库,与 Spring 是兄弟项目. 关于如何基于Spring的组件进行响应式应用的开发,欢迎阅读系列文章<响应式Spring的道法术器>. 官方参考文档地址:http://projectreactor.io/docs/core/release/reference/中文翻译文档地址:http://htmlpreview.github.io/?https://github.com/get-set/reactor-co