1、导出:
hbase org.apache.hadoop.hbase.mapreduce.Driver export 表名 导出存放路径
其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径。
当其为前者时,直接指定即可,也可以加前缀file:///
而当其为后者时,必须明确指明hdfs的路径,例如hdfs://192.168.1.200:9000/path
2、导入:
hbase org.apache.hadoop.hbase.mapreduce.Driver import 表名 要导入的文件路径
同上,其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径。
另外,该接口类还提供了一些其它的方法,例如表与表之间的数据拷贝,导入tsv文件等,可回车键查看
Import操作必须是使用Export出的数据才可以,要不然会报错 not a SequenceFile。
3、本文用的示例:
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000
3.1、首先查看HBase的表waln_log数据,一会就把这个表数据导出来:
hbase(main):009:0> scan ‘waln_log‘
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1432740300560, value=22
row1 column=cf:city, timestamp=1432740308281, value=shanghai
row1 column=cf:name, timestamp=1432740263412, value=zhangsan
row2 column=cf:name, timestamp=1432740296373, value=lisi
2 row(s) in 12.0670 seconds
hbase(main):010:0> desc ‘waln_log‘
Table waln_log is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => ‘cf‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,
MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘
}
1 row(s) in 1.1220 seconds
hbase(main):011:0>
3.2、开始执行:
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log
执行完之后,生成的HDFS目录:
[[email protected] hbase-0.99.2]# hdfs dfs -ls /usr/local/waln_log
Found 2 items
-rw-r--r-- 1 root supergroup 0 2015-05-27 23:34 /usr/local/waln_log/_SUCCESS
-rw-r--r-- 1 root supergroup 289 2015-05-27 23:33 /usr/local/waln_log/part-m-00000
执行部分过程,会跑一个MR:
2015-05-27 23:27:56,261 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib/native
2015-05-27 23:27:56,261 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2015-05-27 23:27:56,261 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=root
2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/root
2015-05-27 23:27:56,262 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/hbase-0.99.2
2015-05-27 23:27:56,264 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.200:2181 sessionTimeout=90000 watcher=hconnection-0x5629409d, quorum=192.168.1.200:2181, baseZNode=/hbase
2015-05-27 23:27:56,542 INFO [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.1.200/192.168.1.200:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-27 23:27:56,544 INFO [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Socket connection established to 192.168.1.200/192.168.1.200:2181, initiating session
2015-05-27 23:27:56,989 INFO [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Session establishment complete on server 192.168.1.200/192.168.1.200:2181, sessionid = 0x14d95f3ad100012, negotiated timeout = 90000
2015-05-27 23:27:57,415 INFO [main] util.RegionSizeCalculator: Calculating region sizes for table "waln_log".
2015-05-27 23:28:13,737 INFO [main] mapreduce.JobSubmitter: number of splits:1
2015-05-27 23:28:13,920 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-05-27 23:28:15,913 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0004
2015-05-27 23:28:24,928 INFO [main] impl.YarnClientImpl: Submitted application application_1432735450462_0004
2015-05-27 23:28:25,448 INFO [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0004/
2015-05-27 23:28:25,449 INFO [main] mapreduce.Job: Running job: job_1432735450462_0004
2015-05-27 23:31:26,790 INFO [main] mapreduce.Job: Job job_1432735450462_0004 running in uber mode : false
2015-05-27 23:31:32,444 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-05-27 23:34:02,361 INFO [main] mapreduce.Job: map 100% reduce 0%
2015-05-27 23:34:53,248 INFO [main] mapreduce.Job: Job job_1432735450462_0004 completed successfully
2015-05-27 23:35:22,330 INFO [main] mapreduce.Job: Counters: 41
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=133771
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=65
HDFS: Number of bytes written=289
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=169690
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=169690
Total vcore-seconds taken by all map tasks=169690
Total megabyte-seconds taken by all map tasks=173762560
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=65
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=2879
CPU time spent (ms)=3350
Physical memory (bytes) snapshot=85295104
Virtual memory (bytes) snapshot=848232448
Total committed heap usage (bytes)=15859712
HBase Counters
BYTES_IN_REMOTE_RESULTS=0
BYTES_IN_RESULTS=157
MILLIS_BETWEEN_NEXTS=13240
NOT_SERVING_REGION_EXCEPTION=0
NUM_SCANNER_RESTARTS=0
NUM_SCAN_RESULTS_STALE=0
REGIONS_SCANNED=1
REMOTE_RPC_CALLS=0
REMOTE_RPC_RETRIES=0
RPC_CALLS=3
RPC_RETRIES=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=289
2015-05-27 23:35:25,049 INFO [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2015-05-27 23:35:26,181 INFO [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2015-05-27 23:35:27,182 INFO [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2015-05-27 23:35:30,592 INFO [main] mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
3.3、为导入数据创建一个表:
hbase(main):011:0> create ‘waln_log1‘,‘cf‘
0 row(s) in 30.2070 seconds
=> Hbase::Table - waln_log1
hbase(main):012:0> scan ‘waln_log‘
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1432740300560, value=22
row1 column=cf:city, timestamp=1432740308281, value=shanghai
row1 column=cf:name, timestamp=1432740263412, value=zhangsan
row2 column=cf:name, timestamp=1432740296373, value=lisi
2 row(s) in 1.3960 seconds
hbase(main):013:0> scan ‘waln_log1‘
ROW COLUMN+CELL
0 row(s) in 0.2750 seconds
hbase(main):014:0> desc ‘waln_log1‘
Table waln_log1 is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => ‘cf‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,
MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘
}
1 row(s) in 0.1500 seconds
hbase(main):015:0>
3.4、执行导入命令:
[[email protected] hbase-0.99.2]# bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000
会跑MR,部分过程:
2015-05-27 23:46:27,426 INFO [main] mapreduce.TableOutputFormat: Created table instance for waln_log1
2015-05-27 23:46:43,436 INFO [main] input.FileInputFormat: Total input paths to process : 1
2015-05-27 23:46:45,387 INFO [main] mapreduce.JobSubmitter: number of splits:1
2015-05-27 23:46:47,327 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0005
2015-05-27 23:46:52,916 INFO [main] impl.YarnClientImpl: Submitted application application_1432735450462_0005
2015-05-27 23:46:53,233 INFO [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0005/
2015-05-27 23:46:53,251 INFO [main] mapreduce.Job: Running job: job_1432735450462_0005
2015-05-27 23:48:52,937 INFO [main] mapreduce.Job: Job job_1432735450462_0005 running in uber mode : false
2015-05-27 23:48:54,258 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-05-27 23:51:32,098 INFO [main] mapreduce.Job: map 100% reduce 0%
2015-05-27 23:52:50,001 INFO [main] mapreduce.Job: Job job_1432735450462_0005 completed successfully
2015-05-27 23:52:54,965 INFO [main] mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=133322
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=411
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=217016
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=217016
Total vcore-seconds taken by all map tasks=217016
Total megabyte-seconds taken by all map tasks=222224384
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=122
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=2050
CPU time spent (ms)=2140
Physical memory (bytes) snapshot=80756736
Virtual memory (bytes) snapshot=845209600
Total committed heap usage (bytes)=15859712
File Input Format Counters
Bytes Read=289
File Output Format Counters
Bytes Written=0
数据导入成功:
hbase(main):015:0> scan ‘waln_log1‘
ROW COLUMN+CELL
row1 column=cf:age, timestamp=1432740300560, value=22
row1 column=cf:city, timestamp=1432740308281, value=shanghai
row1 column=cf:name, timestamp=1432740263412, value=zhangsan
row2 column=cf:name, timestamp=1432740296373, value=lisi
2 row(s) in 2.5040 seconds
hbase(main):016:0>
3.5、创建的表结构与导出数据的表结构不同,会报错:
hbase(main):017:0> create ‘waln_log2‘,‘cf1‘,‘cf2‘
0 row(s) in 25.7630 seconds
=> Hbase::Table - waln_log2
hbase(main):018:0> desc ‘waln_log2‘
Table waln_log2 is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => ‘cf1‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,
MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true
‘}
{NAME => ‘cf2‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, VERSIONS => ‘1‘, COMPRESSION => ‘NONE‘,
MIN_VERSIONS => ‘0‘, TTL => ‘FOREVER‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true
‘}
2 row(s) in 0.2650 seconds
hbase(main):019:0> scan ‘waln_log2‘
ROW COLUMN+CELL
0 row(s) in 0.1870 seconds
hbase(main):020:0>
报错信息:
2015-05-28 00:02:31,224 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-05-28 00:08:20,162 INFO [main] mapreduce.Job: map 100% reduce 0%
2015-05-28 00:08:55,603 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-05-28 00:09:30,247 INFO [main] mapreduce.Job: Task Id : attempt_1432735450462_0006_m_000000_0, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family cf does not exist in region waln_log2,,1432742282179.6cd6a2a4d5ae585bd425ffbce92783c4.
in table ‘waln_log2‘, {NAME => ‘cf1‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, COMPRESSION => ‘NONE‘, VERSIONS => ‘1‘, TTL => ‘FOREVER‘, MIN_VERSIONS => ‘0‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘, IN_MEMORY
=> ‘false‘, BLOCKCACHE => ‘true‘}, {NAME => ‘cf2‘, DATA_BLOCK_ENCODING => ‘NONE‘, BLOOMFILTER => ‘ROW‘, REPLICATION_SCOPE => ‘0‘, COMPRESSION => ‘NONE‘, VERSIONS => ‘1‘, TTL => ‘FOREVER‘, MIN_VERSIONS => ‘0‘, KEEP_DELETED_CELLS => ‘FALSE‘, BLOCKSIZE => ‘65536‘,
IN_MEMORY => ‘false‘, BLOCKCACHE => ‘true‘}