一、快照机制snapshots
简单在hbase上做个表做测试:
hbase(main):044:0> scan ‘student‘
ROW COLUMN+CELL
num1 column=shuxing:name, timestamp=1412189531346, value=jaybing
num2 column=shuxing:name, timestamp=1412189623682, value=jaychou
num3 column=shuxing:like, timestamp=1412189669404, value=game
3 row(s) in 0.0260 seconds
创建这个表的快照:
hbase(main):045:0> snapshot ‘student‘,‘snapshot_student‘
0 row(s) in 1.2620 seconds
[[email protected] ~]# hadoop fs -ls /tmpdir/
Found 9 items
drwxr-xr-x - root supergroup 0 2014-10-02 02:58 /tmpdir/.hbase-snapshot
drwxr-xr-x - root supergroup 0 2014-10-01 21:48 /tmpdir/.tmp
drwxr-xr-x - root supergroup 0 2014-10-01 21:37 /tmpdir/WALs
drwxr-xr-x - root supergroup 0 2014-10-02 02:42 /tmpdir/archive
drwxr-xr-x - root supergroup 0 2014-09-28 00:42 /tmpdir/corrupt
drwxr-xr-x - root supergroup 0 2014-09-26 11:20 /tmpdir/data
-rw-r--r-- 2 root supergroup 42 2014-09-26 11:20 /tmpdir/hbase.id
-rw-r--r-- 2 root supergroup 7 2014-09-26 11:20 /tmpdir/hbase.version
drwxr-xr-x - root supergroup 0 2014-10-02 02:48 /tmpdir/oldWALs
[[email protected] ~]# hadoop fs -ls /tmpdir/.hbase-snapshot
Found 2 items
drwxr-xr-x - root supergroup 0 2014-10-02 02:58 /tmpdir/.hbase-snapshot/.tmp
drwxr-xr-x - root supergroup 0 2014-10-02 02:58 /tmpdir/.hbase-snapshot/snapshot_student 这应该就是快照的数据文件;
删除student表两行,模拟数据文件损坏;
hbase(main):061:0> disable ‘student‘
0 row(s) in 2.0310 seconds
hbase(main):062:0> is_
is_a? is_disabled is_enabled
hbase(main):062:0> is_enabled ‘student‘
false
0 row(s) in 0.0800 seconds
hbase(main):063:0> drop
drop drop_all drop_namespace
hbase(main):063:0> drop ‘student‘
0 row(s) in 0.1940 seconds
hbase(main):064:0> list
TABLE
0 row(s) in 0.0200 seconds
=> []
用快照恢复表:
hbase(main):070:0> restore_snapshot ‘snapshot_student‘
0 row(s) in 6.4950 seconds
hbase(main):071:0> scan ‘student‘
ROW COLUMN+CELL
num1 column=shuxing:name, timestamp=1412189531346, value=jaybing
num2 column=shuxing:name, timestamp=1412189623682, value=jaychou
num3 column=shuxing:like, timestamp=1412189669404, value=game
3 row(s) in 0.2190 seconds
注: 快照只是保存着快照时hbase表那一刻的数据,至于快照以后的增量的数据,快照是 不支持的;
二、导出表Export和 拷贝表copytable
HBase的表导出工具是一个内置的功能,它使数据很容易从hbase导入hdfs目录下的sequencefiles文件,它创造了一个Map reduce任务,通过一系列的hbase api来调用集群,获取指定的表格的每一行数据,并将数据写入指定 的HDFS目录中;
HBase的表拷贝工具和导出工具差不多,拷贝表也hbase api创建map reduce任务,从源数据读取数据,不同的是拷贝的输出是hbase 的另一个表;这个表可在本地集群,也可在远程集群;