微软Azure云平台Hbase 的使用

In this article

What is HBase?

HBase is a low-latency NoSQL database that allows online transactional processing of big data. HBase is offered as a managed cluster integrated into the Azure environment. The clusters are configured to store data directly in Azure Blob storage, which provides low latency and increased elasticity in performance/cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. For more information on HBase and the scenarios it can be used for, see HDInsight HBase overview.

NOTE:

HBase (version 0.98.0) is only available for use with HDInsight 3.1 clusters on HDInsight (based on Apache Hadoop and YARN 2.4.0). For version information, see What‘s new in the Hadoop cluster versions provided by HDInsight?

Prerequisites

Before you begin this tutorial, you must have the following:

Provision an HBase cluster on the Azure portal

This section describes how to provision an HBase cluster using the Azure Management portal.

NOTE:

The steps in this article create an HDInsight cluster using basic configuration settings. For information on other cluster configuration settings, such as using Azure Virtual Network or a metastore for Hive and Oozie, see Provision an HDInsight cluster.

To provision an HDInsight cluster in the Azure Management portal

  1. Sign in to the Azure Management Portal.
  2. Click NEW on the lower left, and then click DATA SERVICESHDINSIGHTHBASE.
  3. Enter CLUSTER NAMECLUSTER SIZE, CLUSTER USER PASSWORD, and STORAGE ACCOUNT.

  4. Click on the check icon on the lower left to create the HBase cluster.

Create an HBase sample table from the HBase shell

This section describes how to enable and use the Remote Desktop Protocol (RDP) to access the HBase shell and then use it to create an HBase sample table, add rows, and then list the rows in the table.

It assumes you have completed the procedure outlined in the first section, and so have already successfully created an HBase cluster.

To enable the RDP connection to the HBase cluster

  1. From the Management portal, click HDINSIGHT from the left to view the list of the existing clusters.
  2. Click the HBase cluster where you want to open HBase Shell.
  3. Click CONFIGURATION from the top.
  4. Click ENABLE REMOTE from the bottom.
  5. Enter the RDP user name and password. The user name must be different from the cluster user name you used when provisioning the cluster. TheEXPIRES ON data can be up to seven days from today.
  6. Click the check on the lower right to enable remote desktop.
  7. After the RPD is enabled, click CONNECT from the bottom of the CONFIGURATION tab, and follow the instructions.

To open the HBase Shell

  1. Within your RDP session, click on the Hadoop Command Line shortcut located on the desktop.
  2. Change the folder to the HBase home directory:
    cd %HBASE_HOME%\bin
  3. Open the HBase shell:
    hbase shell

To create a sample table, add data and retrieve the data

  1. Create a sample table:

    create ‘sampletable‘, ‘cf1‘
  2. Add a row to the sample table:
    put ‘sampletable‘, ‘row1‘, ‘cf1:col1‘, ‘value1‘
  3. List the rows in the sample table:
    scan ‘sampletable‘

Check cluster status in the HBase WebUI

HBase also ships with a WebUI that helps monitoring your cluster, for example by providing request statistics or information about regions. On the HBase cluster you can find the WebUI under the address of the zookeepernode.

http://zookeepernode:60010/master-status

In a HighAvailability (HA) cluster, you will find a link to the current active HBase master node hosting the WebUI.

Bulk load a sample table

  1. Create samplefile1.txt containing the following data, and upload to Azure Blob Storage to /tmp/samplefile1.txt:

    row1    c1  c2
    row2    c1  c2
    row3    c1  c2
    row4    c1  c2
    row5    c1  c2
    row6    c1  c2
    row7    c1  c2
    row8    c1  c2
    row9    c1  c2
    row10    c1 c2
  2. Change the folder to the HBase home directory:
    cd %HBASE_HOME%\bin
  3. Execute ImportTsv:
    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,a:b,a:c" -Dimporttsv.bulk.output=/tmpOutput sampletable2 /tmp/samplefile1.txt
  4. Load the output from prior command into HBase:
    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmpOutput sampletable2

Use Hive to query an HBase table

Now you have an HBase cluster provisioned and have created an HBase table, you can query it using Hive. This section creates a Hive table that maps to the HBase table and uses it to queries the data in your HBase table.

To open cluster dashboard

  1. Sign in to the Azure Management Portal.
  2. Click HDINSIGHT from the left pane. You shall see a list of clusters created including the one you just created in the last section.
  3. Click the cluster name where you want to run the Hive job.
  4. Click QUERY CONSOLE from the bottom of the page to open cluster dashboard. It opens a Web page on a different browser tab.
  5. Enter the Hadoop User account username and password. The default username is admin, the password is what you entered during the provision process. A new browser tab is opened.
  6. Click Hive Editor from the top. The Hive Editor looks like :

To run Hive queries

  1. Enter the HiveQL script below into Hive Editor and click SUBMIT to create an Hive Table mapping to the HBase table. Make sure that you have created the sampletable table referenced here in HBase using the HBase Shell before executing this statement.

    CREATE EXTERNAL TABLE hbasesampletable(rowkey STRING, col1 STRING, col2 STRING)
    STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
    WITH SERDEPROPERTIES (‘hbase.columns.mapping‘ = ‘:key,cf1:col1,cf1:col2‘)
    TBLPROPERTIES (‘hbase.table.name‘ = ‘sampletable‘);

    Wait until the Status is updated to Completed.

  2. Enter the HiveQL script below into Hive Editor, and then click SUBMIT button. The Hive query queries the data in the HBase table:
    SELECT count(*) FROM hbasesampletable;
  3. To retrieve the results of the Hive query, click on the View Details link in the Job Session window when the job finishes executing. The Job Output shall be 1 because you only put one record into the HBase table.

To browse the output file

  1. From Query Console, click File Browser from the top.
  2. Click the Azure Storage account used as the default file system for the HBase cluster.
  3. Click the HBase cluster name. The default Azure storage account container uses the cluster name.
  4. Click user.
  5. Click admin. This is the Hadoop user name.
  6. Click the job name with the Last Modified time matching the time when the SELECT Hive query ran.
  7. Click stdout. Save the file and open the file with Notepad. The output shall be 1.

Use HBase REST Client Library for .NET C# APIs to create an HBase table and retrieve data from the table

The Microsoft HBase REST Client Library for .NET project must be downloaded from GitHub and the project built to use the HBase .NET SDK. The following procedure includes the instructions for this task.

  1. Create a new C# Visual Studio Windows Desktop Console application.
  2. Open NuGet Package Manager Console by click the TOOLS menu, NuGet Package ManagerPackage Manager Console.
  3. Run the following NuGet command in the console:

    Install-Package Microsoft.HBase.Client

  4. Add the following using statements on the top of the file:
    using Microsoft.HBase.Client;
    using org.apache.hadoop.hbase.rest.protobuf.generated;
  5. Replace the Main function with the following:
    static void Main(string[] args)
    {
        string clusterURL = "https://<yourHBaseClusterName>.azurehdinsight.net";
        string hadoopUsername= "<yourHadoopUsername>";
        string hadoopUserPassword = "<yourHadoopUserPassword>";
    
        string hbaseTableName = "sampleHbaseTable";
    
        // Create a new instance of an HBase client.
        ClusterCredentials creds = new ClusterCredentials(new Uri(clusterURL), hadoopUsername, hadoopUserPassword);
        HBaseClient hbaseClient = new HBaseClient(creds);
    
        // Retrieve the cluster version
        var version = hbaseClient.GetVersion();
        Console.WriteLine("The HBase cluster version is " + version);
    
        // Create a new HBase table.
        TableSchema testTableSchema = new TableSchema();
        testTableSchema.name = hbaseTableName;
        testTableSchema.columns.Add(new ColumnSchema() { name = "d" });
        testTableSchema.columns.Add(new ColumnSchema() { name = "f" });
        hbaseClient.CreateTable(testTableSchema);
    
        // Insert data into the HBase table.
        string testKey = "content";
        string testValue = "the force is strong in this column";
        CellSet cellSet = new CellSet();
        CellSet.Row cellSetRow = new CellSet.Row { key = Encoding.UTF8.GetBytes(testKey) };
        cellSet.rows.Add(cellSetRow);
    
        Cell value = new Cell { column = Encoding.UTF8.GetBytes("d:starwars"), data = Encoding.UTF8.GetBytes(testValue) };
        cellSetRow.values.Add(value);
        hbaseClient.StoreCells(hbaseTableName, cellSet);
    
        // Retrieve a cell by its key.
        cellSet = hbaseClient.GetCells(hbaseTableName, testKey);
        Console.WriteLine("The data with the key ‘" + testKey + "‘ is: " + Encoding.UTF8.GetString(cellSet.rows[0].values[0].data));
        // with the previous insert, it should yield: "the force is strong in this column"
    
        //Scan over rows in a table. Assume the table has integer keys and you want data between keys 25 and 35.
        Scanner scanSettings = new Scanner()
        {
            batch = 10,
            startRow = BitConverter.GetBytes(25),
            endRow = BitConverter.GetBytes(35)
        };
    
        ScannerInformation scannerInfo = hbaseClient.CreateScanner(hbaseTableName, scanSettings);
        CellSet next = null;
        Console.WriteLine("Scan results");
    
        while ((next = hbaseClient.ScannerGetNext(scannerInfo)) != null)
        {
            foreach (CellSet.Row row in next.rows)
            {
                Console.WriteLine(row.key + " : " + Encoding.UTF8.GetString(row.values[0].data));
            }
        }
    
        Console.WriteLine("Press ENTER to continue ...");
        Console.ReadLine();
    }
  6. Set the first three variables in the Main function.
  7. Press F5 to run the application.

What‘s Next?

In this tutorial, you have learned how to provision an HBase cluster, how to create tables, and and view the data in those tables from the HBase shell. You also learned how use Hive to query the data in HBase tables and how to use the HBase C# APIs to create an HBase table and retrieve data from the table.

To learn more, see:

时间: 2024-08-08 17:48:41

微软Azure云平台Hbase 的使用的相关文章

Azure云平台学习之路(三)——Cloud Services

1.什么是云服务? 能够部署高度可用的且可无限缩放的应用程序和API.简而言之,就是你写的CMD程序按照一定的框架进行少量修改就能运行在Azure云平台上. 2.Azure云服务有什么特点? (1)专注应用程序而不是硬件,PaaS的一种. (2)支持多种框架和语言. (3)集成了运行状况监视和负载平衡. (4)自动缩放优化成本和性能 3.建立云服务之前,我们需要建立一个云存储,来记录我们的程序的日志信息(当然,这不是必须的) (1)选择左边导航栏的"存储".主面板上显示的是所有已有的存

Azure云平台学习之路(一)——Azure简介

1.什么是Azure? Microsoft Azure是由微软所发展的一套云计算操作系统(云平台),提供各种优质的服务:计算.存储.数据.网络和应用程序.Azure意为天蓝色.蔚蓝色. 2.Azure特点? (1)平台即服务(Platform as a Service,PaaS)+ 基础架构即服务(Infrastructure as a Service,IaaS). 美国商务部国家标准和技术研究所(NIST)定义三种服务模式: a.SaaS(软件即服务),消费者不用操心任何问题或麻烦.如outl

Win 10紧紧粘连在微软Azure云上吸收云服务

根据许多有关Win 10的博客文章来看,微软让Win 10紧紧地粘连在微软的Azure云平台上,Win10与Azure云的分界线十分模糊.使用"粘连"这个词的意思是,Win 10主体与微软Azure云的界线相当模糊,千丝万缕,理不清. 根据有关媒体分析,Win 10免费升级可能给微软造成5亿美元的损失,但是,由此带来的用户提升,给微软带来不少的盈利机会. 对于一般消费者而言,使用云服务的价格为每小时2美分,使用多少时间,系统自动计费,不使用,不收费.所以,Win 10的新款浏览器Spa

微软Azure云主机及blob存储的网络性能测试

  微软Azure云主机及blob存储的网络性能测试 1. 测试目的 本次测试的目的在于对微软Azure的云主机.blob存储的网络性能以及DNS解析的稳定性做相关测试,评估其是否能够满足我们业务的需求. 2. 测试项目 ? 微软Azure云主机的网络性能 ? 微软blob存储的网络性能 ? DNS解析稳定性测试 3. 测试方法 本次测试使用多种第三方分布式工具作为访问源及评测工具,比照测试结果数据,以综合评估微软Azure的网络性能及稳定性. 4. 网络性能测试 4.1. 网络带宽测试 我们通

Azure云平台学习之路(二)——SQL Database

1.什么是SQL Database? 托管关系数据库,数据库即服务,IaaS的一种. 2.有哪些特点? (1)为SaaS应用提供可扩展的数据存储服务. (2)易于操作大量的数据库.稍后介绍几种操作数据库的方法. (3)性能更高,稳定性更好.基于Azure平台,高性能可扩展:多地区容灾备份. (4)接近0维护,无需维护其他硬件设备. (5)支持熟悉的功能.工具和平台. (6)安全和审核功能. 3.使用的企业有: 4.创建一个数据库. (1)进入https://manage.windowsazure

玩转树莓派&mdash;&mdash;把RaspBerry Pi 3连接到Azure云平台

更新版的IoT Dashboard多了连接到Azure云的选项.正好之前激活了Azure账号,试试.   要把树莓派等设备连接到Azure云服务,首先需要连接到Azure IoT中心(IoT Hub),然后为设备创建一个Azure 设备 ID.如果没有可选的Azure IoT Hub,则可以通过订阅直接在IoT Dashboard中创建. 当然,可以在Azure的控制台里面创建.   IoT Hub目前有四种不同定价,功能上并没有差异,只是允许的每天/单位/消息数量的差别.用于测试的话,选择免费

微软Azure云之企业Exchange 2016部署14&mdash;预留VIP

当Azure云服务中所有虚拟都处于关闭并已取消分配时,Azure会回收已分配的公网IP.当我们再次开启虚拟机使用云服务时,Azure会再次分配一个新的公网IP,每次分配到的IP都不一样,我们之前已配置的DNS解析,反向解析等都会失效. 所以,我们可以选择预留公网IP.这样的话即使云服务中的所有资源都被释放了,下次再次启动时,这个云服务的公网IP还是会保持不变.当然预留VIP是收费的. 预留公网IP,需要使用Azure Powershell 我们使用以下命令: New-AzureReservedI

Azure 云平台用 SQOOP 将 SQL server 2012 数据表导入 HIVE / HBASE

My name is Farooq and I am with HDinsight support team here at Microsoft. In this blog I will try to give some brief overview of Sqoop in HDinsight and then use an example of importing data from a Windows Azure SQL Database table to HDInsight cluster

微软Azure云之企业Exchange 2016部署12&mdash;配置负载平衡

上几节我们把Exchange2016邮箱系统配置完毕了,本节我们来配置开通需要对外的端口,并对这些端口进行负载均衡配置,如下图红框所示: 1.Exchange服务端口 Exchange2016需要对外开通以下的端口,来对外提供不同的服务: 其中最主要的事HTTPS(443)端口,以下服务需要用到此端口: 自动发现服务 Exchange ActiveSync Exchange Web 服务 (EWS) 脱机通讯簿 Outlook 无处不在 MAPI over HTTP in Exchange 20