HBase数据的导入导出

1、导出:

hbase org.apache.hadoop.hbase.mapreduce.Driver export 表名 导出存放路径

其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径。

当其为前者时,直接指定即可,也可以加前缀file:///

而当其为后者时,必须明确指明hdfs的路径,例如hdfs://192.168.1.200:9000/path

2、导入:

hbase org.apache.hadoop.hbase.mapreduce.Driver import 表名 要导入的文件路径

同上,其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径。

另外,该接口类还提供了一些其它的方法,例如表与表之间的数据拷贝,导入tsv文件等,可回车键查看

Import操作必须是使用Export出的数据才可以,要不然会报错 not a SequenceFile。

3、本文用的示例:

bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log

bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000

3.1、首先查看HBase的表waln_log数据,一会就把这个表数据导出来:

hbase(main):009:0> scan ‘waln_log‘

ROW                                COLUMN+CELL

row1                              column=cf:age, timestamp=1432740300560, value=22

row1                              column=cf:city, timestamp=1432740308281, value=shanghai

row1                              column=cf:name, timestamp=1432740263412, value=zhangsan

row2                              column=cf:name, timestamp=1432740296373, value=lisi

2 row(s) in 12.0670 seconds

hbase(main):010:0> desc ‘waln_log‘

Table waln_log is ENABLED

COLUMN FAMILIES DESCRIPTION

{NAME => ‘cf‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘

}

1 row(s) in 1.1220 seconds

hbase(main):011:0>

3.2、开始执行:

bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log

执行完之后,生成的HDFS目录:

[[email protected] hbase-0.99.2]# hdfs dfs -ls /usr/local/waln_log

Found 2 items

-rw-r--r--   1 root supergroup          0 2015-05-27 23:34 /usr/local/waln_log/_SUCCESS

-rw-r--r--   1 root supergroup        289 2015-05-27 23:33 /usr/local/waln_log/part-m-00000

执行部分过程,会跑一个MR:

2015-05-27 23:27:56,261 INFO  [main] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib/native

2015-05-27 23:27:56,261 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

2015-05-27 23:27:56,261 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux

2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64

2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64

2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=root

2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/root

2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/hbase-0.99.2

2015-05-27 23:27:56,264 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.200:2181 sessionTimeout=90000 watcher=hconnection-0x5629409d, quorum=192.168.1.200:2181, baseZNode=/hbase

2015-05-27 23:27:56,542 INFO  [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.1.200/192.168.1.200:2181. Will not attempt to authenticate using SASL (unknown error)

2015-05-27 23:27:56,544 INFO  [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Socket connection established to 192.168.1.200/192.168.1.200:2181, initiating session

2015-05-27 23:27:56,989 INFO  [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Session establishment complete on server 192.168.1.200/192.168.1.200:2181, sessionid = 0x14d95f3ad100012, negotiated timeout = 90000

2015-05-27 23:27:57,415 INFO  [main] util.RegionSizeCalculator: Calculating region sizes for table "waln_log".

2015-05-27 23:28:13,737 INFO  [main] mapreduce.JobSubmitter: number of splits:1

2015-05-27 23:28:13,920 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum

2015-05-27 23:28:15,913 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0004

2015-05-27 23:28:24,928 INFO  [main] impl.YarnClientImpl: Submitted application application_1432735450462_0004

2015-05-27 23:28:25,448 INFO  [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0004/

2015-05-27 23:28:25,449 INFO  [main] mapreduce.Job: Running job: job_1432735450462_0004

2015-05-27 23:31:26,790 INFO  [main] mapreduce.Job: Job job_1432735450462_0004 running in uber mode : false

2015-05-27 23:31:32,444 INFO  [main] mapreduce.Job:  map 0% reduce 0%

2015-05-27 23:34:02,361 INFO  [main] mapreduce.Job:  map 100% reduce 0%

2015-05-27 23:34:53,248 INFO  [main] mapreduce.Job: Job job_1432735450462_0004 completed successfully

2015-05-27 23:35:22,330 INFO  [main] mapreduce.Job: Counters: 41

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=133771

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=65

HDFS: Number of bytes written=289

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=169690

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=169690

Total vcore-seconds taken by all map tasks=169690

Total megabyte-seconds taken by all map tasks=173762560

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=65

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2879

CPU time spent (ms)=3350

Physical memory (bytes) snapshot=85295104

Virtual memory (bytes) snapshot=848232448

Total committed heap usage (bytes)=15859712

HBase Counters

BYTES_IN_REMOTE_RESULTS=0

BYTES_IN_RESULTS=157

MILLIS_BETWEEN_NEXTS=13240

NOT_SERVING_REGION_EXCEPTION=0

NUM_SCANNER_RESTARTS=0

NUM_SCAN_RESULTS_STALE=0

REGIONS_SCANNED=1

REMOTE_RPC_CALLS=0

REMOTE_RPC_RETRIES=0

RPC_CALLS=3

RPC_RETRIES=0

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=289

2015-05-27 23:35:25,049 INFO  [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)

2015-05-27 23:35:26,181 INFO  [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)

2015-05-27 23:35:27,182 INFO  [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)

2015-05-27 23:35:30,592 INFO  [main] mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

3.3、为导入数据创建一个表:

hbase(main):011:0> create ‘waln_log1‘,‘cf‘

0 row(s) in 30.2070 seconds

=> Hbase::Table - waln_log1

hbase(main):012:0> scan ‘waln_log‘

ROW                                COLUMN+CELL

row1                              column=cf:age, timestamp=1432740300560, value=22

row1                              column=cf:city, timestamp=1432740308281, value=shanghai

row1                              column=cf:name, timestamp=1432740263412, value=zhangsan

row2                              column=cf:name, timestamp=1432740296373, value=lisi

2 row(s) in 1.3960 seconds

hbase(main):013:0> scan ‘waln_log1‘

ROW                                COLUMN+CELL

0 row(s) in 0.2750 seconds

hbase(main):014:0> desc ‘waln_log1‘

Table waln_log1 is ENABLED

COLUMN FAMILIES DESCRIPTION

{NAME => ‘cf‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘

}

1 row(s) in 0.1500 seconds

hbase(main):015:0>

3.4、执行导入命令:

[[email protected] hbase-0.99.2]# bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000

会跑MR,部分过程:

2015-05-27 23:46:27,426 INFO  [main] mapreduce.TableOutputFormat: Created table instance for waln_log1

2015-05-27 23:46:43,436 INFO  [main] input.FileInputFormat: Total input paths to process : 1

2015-05-27 23:46:45,387 INFO  [main] mapreduce.JobSubmitter: number of splits:1

2015-05-27 23:46:47,327 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0005

2015-05-27 23:46:52,916 INFO  [main] impl.YarnClientImpl: Submitted application application_1432735450462_0005

2015-05-27 23:46:53,233 INFO  [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0005/

2015-05-27 23:46:53,251 INFO  [main] mapreduce.Job: Running job: job_1432735450462_0005

2015-05-27 23:48:52,937 INFO  [main] mapreduce.Job: Job job_1432735450462_0005 running in uber mode : false

2015-05-27 23:48:54,258 INFO  [main] mapreduce.Job:  map 0% reduce 0%

2015-05-27 23:51:32,098 INFO  [main] mapreduce.Job:  map 100% reduce 0%

2015-05-27 23:52:50,001 INFO  [main] mapreduce.Job: Job job_1432735450462_0005 completed successfully

2015-05-27 23:52:54,965 INFO  [main] mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=133322

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=411

HDFS: Number of bytes written=0

HDFS: Number of read operations=3

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=217016

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=217016

Total vcore-seconds taken by all map tasks=217016

Total megabyte-seconds taken by all map tasks=222224384

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=122

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2050

CPU time spent (ms)=2140

Physical memory (bytes) snapshot=80756736

Virtual memory (bytes) snapshot=845209600

Total committed heap usage (bytes)=15859712

File Input Format Counters

Bytes Read=289

File Output Format Counters

Bytes Written=0

数据导入成功:

hbase(main):015:0> scan ‘waln_log1‘

ROW                                COLUMN+CELL

row1                              column=cf:age, timestamp=1432740300560, value=22

row1                              column=cf:city, timestamp=1432740308281, value=shanghai

row1                              column=cf:name, timestamp=1432740263412, value=zhangsan

row2                              column=cf:name, timestamp=1432740296373, value=lisi

2 row(s) in 2.5040 seconds

hbase(main):016:0>

3.5、创建的表结构与导出数据的表结构不同,会报错:

hbase(main):017:0> create ‘waln_log2‘,‘cf1‘,‘cf2‘

0 row(s) in 25.7630 seconds

=> Hbase::Table - waln_log2

hbase(main):018:0> desc ‘waln_log2‘

Table waln_log2 is ENABLED

COLUMN FAMILIES DESCRIPTION

{NAME => ‘cf1‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true

‘}

{NAME => ‘cf2‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true

‘}

2 row(s) in 0.2650 seconds

hbase(main):019:0> scan ‘waln_log2‘

ROW                                COLUMN+CELL

0 row(s) in 0.1870 seconds

hbase(main):020:0>

报错信息:

2015-05-28 00:02:31,224 INFO  [main] mapreduce.Job:  map 0% reduce 0%

2015-05-28 00:08:20,162 INFO  [main] mapreduce.Job:  map 100% reduce 0%

2015-05-28 00:08:55,603 INFO  [main] mapreduce.Job:  map 0% reduce 0%

2015-05-28 00:09:30,247 INFO  [main] mapreduce.Job: Task Id : attempt_1432735450462_0006_m_000000_0, Status : FAILED

Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family cf does not exist in region waln_log2,,1432742282179.6cd6a2a4d5ae585bd425ffbce92783c4.
in table ‘waln_log2‘, {NAME => ‘cf1‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, COMPRESSION => ‘NONE‘, VERSIONS => ‘1‘, TTL => ‘FOREVER‘, MIN_VERSIONS => ‘0‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY
=> ‘false‘, BLOCKCACHE => ‘true‘}, {NAME => ‘cf2‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, COMPRESSION => ‘NONE‘, VERSIONS => ‘1‘, TTL => ‘FOREVER‘, MIN_VERSIONS => ‘0‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘,
IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘}

时间: 2024-10-12 18:26:38

HBase数据的导入导出的相关文章

HBase数据的导入和导出

查阅了几篇中英文资料,发现有的地方说的不是很全部,总结在此,共有两种命令行的方式来实现数据的导入导出功能,即备份和还原. 1 HBase本身提供的接口 其调用形式为: 1)导入 ./hbase org.apache.hadoop.hbase.mapreduce.Driver import 表名    数据文件位置 其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径. 当其为前者时,直接指定即可,也可以加前缀file:/// 而当其伟后者时,必须明确指明hdfs的路径,例如hdf

HBase备份之导入导出

我们在上一篇文章<HBase复制>中讲述了如何建立主/从集群,实现数据的实时备份.但是,HBase复制只对设置好复制以后的数据生效,也即,配置好复制之后插入HBase主集群的数据才能同步复制到HBase从集群中,而对之前的历史数据,采用HBase复制这种办法是无能为力的.本文介绍如何使用HBase的导入导出功能来实现历史数据的备份. 1)将HBase表数据导出到hdfs的一个指定目录中,具体命令如下: $ cd $HBASE_HOME/ $ bin/hbase org.apache.hadoo

Oracle 12c pdb的数据泵导入导出

12c推出了可插拔数据库,在一个容器cdb中以多租户的形式同时存在多个数据库pdb.在为pdb做数据泵导入导出时和传统的数据库有少许不同.           1,需要为pdb添加tansnames           2,导入导出时需要在userid参数内指定其tansnames的值,比如 userid=user/[email protected]   数据泵导入导出例子 1.查看当前的SID,查看pdb并切换到容器数据库,这里的pluggable数据库是pdborcl [[email pro

MATLAB中文件的读写和数据的导入导出

http://blog.163.com/tawney_daylily/blog/static/13614643620111117853933/ 在编写一个程序时,经常需要从外部读入数据,或者将程序运行的结果保存为文件.MATLAB使用多种格式打开和保存数据.本章将要介绍 MATLAB中文件的读写和数据的导入导出. 13.1 数据基本操作 本节介绍基本的数据操作,包括工作区的保存.导入和文件打开.13.1.1 文件的存储 MATLAB支持工作区的保存.用户可以将工作区或工作区中的变量以文件的形式保

Sql server与Excel的数据互通导入导出

现在,我先从Sql server数据表导出到Excel中,再从Excel数据表导出到Sql server中: 一.Sql server数据表导出到Excel中: 1.新建一个Excel,选择"数据"菜单: 2.依次选择   "导入外部数据"--"导入数据" 后: 3.双击"新的SQL Server 连接.odc": 4.点击"下一步": 5.点击"下一步": 6.点击"完成&q

HBase数据快速导入之ImportTsv&amp;Bulkload

导入数据最快的方式,可以略过WAL直接生产底层HFile文件 (环境:centos6.5.Hadoop2.6.0.HBase0.98.9) 1.SHELL方式 1.1 ImportTsv直接导入 命令:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir> 测试: 1.1.1在HBase中创建好表 c

Oracle 数据泵导入导出总结

Oracle 数据泵(IMPDP/EXPDP)导入导出总结 Oracle数据泵导入导出是日常工作中常用的基本技术之一,它相对传统的逻辑导入导出要高效,这种特性更适合数据库对象数量巨大的情形,因为我日常运维的数据库对象少则几千,多则几万甚至几十万,所以传统exp/imp就会非常耗时,而数据泵方式就因此脱引而出,下面就详细总结一下数据泵的使用方法,希望能给初学者带来帮助. 一.新建逻辑目录 最好以system等管理员创建逻辑目录,Oracle不会自动创建实际的物理目录“D:\oracleData”(

使用phpExcel实现Excel数据的导入导出(完全步骤)

使用phpExcel实现Excel数据的导入导出(完全步骤) 很多文章都有提到关于使用phpExcel实现Excel数据的导入导出,大部分文章都差不多,或者就是转载的,都会出现一些问题,下面是本人研究phpExcel的使用例程总结出来的使用方法,接下来直接进入正题. 首先先说一下,本人的这段例程是使用在Thinkphp的开发框架上,要是使用在其他框架也是同样的方法,很多人可能不能正确的实现Excel的导入导出,问题基本上都是phpExcel的核心类引用路径出错,如果有问题大家务必要对路劲是否引用

客户关系管理系统中对客户及相关数据的导入导出操作

在很多系统,我们都知道,Excel数据的导入导出操作是必不可少的一个功能,这种功能能够给使用者和外部进行数据交换,也能批量迅速的录入数据到系统中:但在一些系统中,为了方便,可能把很多个基础表或者相关的数据综合到一个Excel表格文件里面,然后希望通过接口进行导入,这种需求处理就显得比较复杂一点了.本文探讨在我的客户关系管理系统中,对于单个Excel表格中,集合了客户基础数据及相关数据的导入和导出操作的处理. 1.导入导出的需求分析 本随笔主要介绍如何在系统中,导入单一文件中的数据到系统中,这个文