HBase数据的导入导出

1、导出：

hbase org.apache.hadoop.hbase.mapreduce.Driver export 表名导出存放路径

其中数据文件位置可为本地文件目录，也可以分布式文件系统hdfs的路径。

当其为前者时，直接指定即可，也可以加前缀file:///

而当其为后者时，必须明确指明hdfs的路径，例如hdfs://192.168.1.200:9000/path

2、导入：

hbase org.apache.hadoop.hbase.mapreduce.Driver import 表名要导入的文件路径

同上，其中数据文件位置可为本地文件目录，也可以分布式文件系统hdfs的路径。

另外，该接口类还提供了一些其它的方法，例如表与表之间的数据拷贝，导入tsv文件等，可回车键查看

Import操作必须是使用Export出的数据才可以，要不然会报错 not a SequenceFile。

3、本文用的示例：

bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log

bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000

3.1、首先查看HBase的表waln_log数据，一会就把这个表数据导出来：

hbase(main):009:0> scan ‘waln_log‘

ROW COLUMN+CELL

row1 column=cf:age, timestamp=1432740300560, value=22

row1 column=cf:city, timestamp=1432740308281, value=shanghai

row1 column=cf:name, timestamp=1432740263412, value=zhangsan

row2 column=cf:name, timestamp=1432740296373, value=lisi

2 row(s) in 12.0670 seconds

hbase(main):010:0> desc ‘waln_log‘

Table waln_log is ENABLED

COLUMN FAMILIES DESCRIPTION

{NAME => ‘cf‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘

}

1 row(s) in 1.1220 seconds

hbase(main):011:0>

3.2、开始执行：

bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log

执行完之后，生成的HDFS目录：

[[email protected] hbase-0.99.2]# hdfs dfs -ls /usr/local/waln_log

Found 2 items

-rw-r--r-- 1 root supergroup 0 2015-05-27 23:34 /usr/local/waln_log/_SUCCESS

-rw-r--r-- 1 root supergroup 289 2015-05-27 23:33 /usr/local/waln_log/part-m-00000

执行部分过程，会跑一个MR：

2015-05-27 23:27:56,261 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib/native

2015-05-27 23:27:56,261 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

2015-05-27 23:27:56,261 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux

2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64

2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64

2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=root

2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/root

2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/hbase-0.99.2

2015-05-27 23:27:56,264 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.200:2181 sessionTimeout=90000 watcher=hconnection-0x5629409d, quorum=192.168.1.200:2181, baseZNode=/hbase

2015-05-27 23:27:56,542 INFO [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.1.200/192.168.1.200:2181. Will not attempt to authenticate using SASL (unknown error)

2015-05-27 23:27:56,544 INFO [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Socket connection established to 192.168.1.200/192.168.1.200:2181, initiating session

2015-05-27 23:27:56,989 INFO [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Session establishment complete on server 192.168.1.200/192.168.1.200:2181, sessionid = 0x14d95f3ad100012, negotiated timeout = 90000

2015-05-27 23:27:57,415 INFO [main] util.RegionSizeCalculator: Calculating region sizes for table "waln_log".

2015-05-27 23:28:13,737 INFO [main] mapreduce.JobSubmitter: number of splits:1

2015-05-27 23:28:13,920 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum

2015-05-27 23:28:15,913 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0004

2015-05-27 23:28:24,928 INFO [main] impl.YarnClientImpl: Submitted application application_1432735450462_0004

2015-05-27 23:28:25,448 INFO [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0004/

2015-05-27 23:28:25,449 INFO [main] mapreduce.Job: Running job: job_1432735450462_0004

2015-05-27 23:31:26,790 INFO [main] mapreduce.Job: Job job_1432735450462_0004 running in uber mode : false

2015-05-27 23:31:32,444 INFO [main] mapreduce.Job: map 0% reduce 0%

2015-05-27 23:34:02,361 INFO [main] mapreduce.Job: map 100% reduce 0%

2015-05-27 23:34:53,248 INFO [main] mapreduce.Job: Job job_1432735450462_0004 completed successfully

2015-05-27 23:35:22,330 INFO [main] mapreduce.Job: Counters: 41

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=133771

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=65

HDFS: Number of bytes written=289

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=169690

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=169690

Total vcore-seconds taken by all map tasks=169690

Total megabyte-seconds taken by all map tasks=173762560

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=65

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2879

CPU time spent (ms)=3350

Physical memory (bytes) snapshot=85295104

Virtual memory (bytes) snapshot=848232448

Total committed heap usage (bytes)=15859712

HBase Counters

BYTES_IN_REMOTE_RESULTS=0

BYTES_IN_RESULTS=157

MILLIS_BETWEEN_NEXTS=13240

NOT_SERVING_REGION_EXCEPTION=0

NUM_SCANNER_RESTARTS=0

NUM_SCAN_RESULTS_STALE=0

REGIONS_SCANNED=1

REMOTE_RPC_CALLS=0

REMOTE_RPC_RETRIES=0

RPC_CALLS=3

RPC_RETRIES=0

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=289

2015-05-27 23:35:25,049 INFO [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)

2015-05-27 23:35:26,181 INFO [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)

2015-05-27 23:35:27,182 INFO [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)

2015-05-27 23:35:30,592 INFO [main] mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

3.3、为导入数据创建一个表：

hbase(main):011:0> create ‘waln_log1‘,‘cf‘

0 row(s) in 30.2070 seconds

=> Hbase::Table - waln_log1

hbase(main):012:0> scan ‘waln_log‘

ROW COLUMN+CELL

row1 column=cf:age, timestamp=1432740300560, value=22

row1 column=cf:city, timestamp=1432740308281, value=shanghai

row1 column=cf:name, timestamp=1432740263412, value=zhangsan

row2 column=cf:name, timestamp=1432740296373, value=lisi

2 row(s) in 1.3960 seconds

hbase(main):013:0> scan ‘waln_log1‘

ROW COLUMN+CELL

0 row(s) in 0.2750 seconds

hbase(main):014:0> desc ‘waln_log1‘

Table waln_log1 is ENABLED

COLUMN FAMILIES DESCRIPTION

{NAME => ‘cf‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘

}

1 row(s) in 0.1500 seconds

hbase(main):015:0>

3.4、执行导入命令：

[[email protected] hbase-0.99.2]# bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000

会跑MR，部分过程：

2015-05-27 23:46:27,426 INFO [main] mapreduce.TableOutputFormat: Created table instance for waln_log1

2015-05-27 23:46:43,436 INFO [main] input.FileInputFormat: Total input paths to process : 1

2015-05-27 23:46:45,387 INFO [main] mapreduce.JobSubmitter: number of splits:1

2015-05-27 23:46:47,327 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0005

2015-05-27 23:46:52,916 INFO [main] impl.YarnClientImpl: Submitted application application_1432735450462_0005

2015-05-27 23:46:53,233 INFO [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0005/

2015-05-27 23:46:53,251 INFO [main] mapreduce.Job: Running job: job_1432735450462_0005

2015-05-27 23:48:52,937 INFO [main] mapreduce.Job: Job job_1432735450462_0005 running in uber mode : false

2015-05-27 23:48:54,258 INFO [main] mapreduce.Job: map 0% reduce 0%

2015-05-27 23:51:32,098 INFO [main] mapreduce.Job: map 100% reduce 0%

2015-05-27 23:52:50,001 INFO [main] mapreduce.Job: Job job_1432735450462_0005 completed successfully

2015-05-27 23:52:54,965 INFO [main] mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=133322

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=411

HDFS: Number of bytes written=0

HDFS: Number of read operations=3

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=217016

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=217016

Total vcore-seconds taken by all map tasks=217016

Total megabyte-seconds taken by all map tasks=222224384

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=122

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=2050

CPU time spent (ms)=2140

Physical memory (bytes) snapshot=80756736

Virtual memory (bytes) snapshot=845209600

Total committed heap usage (bytes)=15859712

File Input Format Counters

Bytes Read=289

File Output Format Counters

Bytes Written=0

数据导入成功：

hbase(main):015:0> scan ‘waln_log1‘

ROW COLUMN+CELL

row1 column=cf:age, timestamp=1432740300560, value=22

row1 column=cf:city, timestamp=1432740308281, value=shanghai

row1 column=cf:name, timestamp=1432740263412, value=zhangsan

row2 column=cf:name, timestamp=1432740296373, value=lisi

2 row(s) in 2.5040 seconds

hbase(main):016:0>

3.5、创建的表结构与导出数据的表结构不同，会报错：

hbase(main):017:0> create ‘waln_log2‘,‘cf1‘,‘cf2‘

0 row(s) in 25.7630 seconds

=> Hbase::Table - waln_log2

hbase(main):018:0> desc ‘waln_log2‘

Table waln_log2 is ENABLED

COLUMN FAMILIES DESCRIPTION

{NAME => ‘cf1‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true

‘}

{NAME => ‘cf2‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,

MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true

‘}

2 row(s) in 0.2650 seconds

hbase(main):019:0> scan ‘waln_log2‘

ROW COLUMN+CELL

0 row(s) in 0.1870 seconds

hbase(main):020:0>

报错信息：

2015-05-28 00:02:31,224 INFO [main] mapreduce.Job: map 0% reduce 0%

2015-05-28 00:08:20,162 INFO [main] mapreduce.Job: map 100% reduce 0%

2015-05-28 00:08:55,603 INFO [main] mapreduce.Job: map 0% reduce 0%

2015-05-28 00:09:30,247 INFO [main] mapreduce.Job: Task Id : attempt_1432735450462_0006_m_000000_0, Status : FAILED

Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family cf does not exist in region waln_log2,,1432742282179.6cd6a2a4d5ae585bd425ffbce92783c4.
in table ‘waln_log2‘, {NAME => ‘cf1‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, COMPRESSION => ‘NONE‘, VERSIONS => ‘1‘, TTL => ‘FOREVER‘, MIN_VERSIONS => ‘0‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY
=> ‘false‘, BLOCKCACHE => ‘true‘}, {NAME => ‘cf2‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, COMPRESSION => ‘NONE‘, VERSIONS => ‘1‘, TTL => ‘FOREVER‘, MIN_VERSIONS => ‘0‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘,
IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘}

时间： 2024-10-12 18:26:38

HBase数据的导入导出

HBase数据的导入导出的相关文章

HBase数据的导入和导出

HBase备份之导入导出

Oracle 12c pdb的数据泵导入导出

MATLAB中文件的读写和数据的导入导出

Sql server与Excel的数据互通导入导出

HBase数据快速导入之ImportTsv&Bulkload

Oracle 数据泵导入导出总结

使用phpExcel实现Excel数据的导入导出(完全步骤)

客户关系管理系统中对客户及相关数据的导入导出操作