HBase数据导出到HDFS

一、目的

把hbase中某张表的数据导出到hdfs上一份。

实现方式这里介绍两种：一种是自己写mr程序来完成，一种是使用hbase提供的类来完成。

二、自定义mr程序将hbase数据导出到hdfs上

2.1首先看看hbase中t1表中的数据：

2.2mr的代码如下：

比较重要的语句是

job.setNumReduceTasks(0);//为什么要设置reduce的数量是0呢？读者可以自己考虑下
TableMapReduceUtil.initTableMapperJob(args[0], new Scan(),HBaseToHdfsMapper.class ,Text.class, Text.class, job);//这行语句指定了mr的输入是hbase的哪张表，scan可以对这个表进行filter操作。

public class HBaseToHdfs {
	public static void main(String[] args) throws Exception {
		Configuration conf = HBaseConfiguration.create();
		Job job = Job.getInstance(conf, HBaseToHdfs.class.getSimpleName());
		job.setJarByClass(HBaseToHdfs.class);

		job.setMapperClass(HBaseToHdfsMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);

		job.setNumReduceTasks(0);

		TableMapReduceUtil.initTableMapperJob(args[0], new Scan(),HBaseToHdfsMapper.class ,Text.class, Text.class, job);
		//TableMapReduceUtil.addDependencyJars(job);

		job.setOutputFormatClass(TextOutputFormat.class);
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.waitForCompletion(true);
	}

	public static class HBaseToHdfsMapper extends TableMapper<Text, Text> {
		private Text outKey = new Text();
		private Text outValue = new Text();
		@Override
		protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
			//key在这里就是hbase的rowkey
			byte[] name = null;
			byte[] age = null;
			byte[] gender = null;
			byte[] birthday = null;
			try {
				name = value.getColumnLatestCell("f1".getBytes(), "name".getBytes()).getValue();
			} catch (Exception e) {}
			try {
				age = value.getColumnLatestCell("f1".getBytes(), "age".getBytes()).getValue();
			} catch (Exception e) {}
			try {
				gender = value.getColumnLatestCell("f1".getBytes(), "gender".getBytes()).getValue();
			} catch (Exception e) {}
			try {
				birthday = value.getColumnLatestCell("f1".getBytes(), "birthday".getBytes()).getValue();
			} catch (Exception e) {}
			outKey.set(key.get());
			String temp = ((name==null || name.length==0)?"NULL":new String(name)) + "\t" + ((age==null || age.length==0)?"NULL":new String(age)) + "\t" + ((gender==null||gender.length==0)?"NULL":new String(gender)) + "\t" +  ((birthday==null||birthday.length==0)?"NULL":new String(birthday));
			System.out.println(temp);
			outValue.set(temp);
			context.write(outKey, outValue);
		}

	}
}

2.3打包执行

hadoop jar hbaseToDfs.jar com.lanyun.hadoop2.HBaseToHdfs t1 /t1

2.4查看hdfs上的文件

(my_python_env)[[email protected] ~]# hadoop fs -cat /t1/part*
1    zhangsan    10    male    NULL
2    lisi    NULL    NULL    NULL
3    wangwu    NULL    NULL    NULL
4    zhaoliu    NULL    NULL    1993

至此，导出成功

三、使用hbase自带的工具进行导出

hbase自带的工具是：org.apache.hadoop.hbase.mapreduce.Export
3.1如何使用这个工具呢？查看帮助信息

(my_python_env)[[email protected] ~]# hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

3.2使用工具来导出

hbase org.apache.hadoop.hbase.mapreduce.Export t1 /t2

至此已经完成导出。

时间： 2024-10-12 02:30:07

HBase数据导出到HDFS的相关文章

用mapreduce实现将mysql数据导出到HDFS上

因为业务需要,需要将一批mysql数据导入到HBASE,现在先将数据从Mysql导出到HDFS. 版本:hadoop CDH4.5,Hbase-0.946 1.实体类 YqBean 是我的实体类,请根据自己需要修改,实体类需要 implements Writable, DBWritable. 2.MR实现 import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configurati

安装sqoop，并将Mysql中的表数据导出到HDFS下的文本文件

首先是安装mysql数据库.使用 sudo apt-get install mysql-server命令即可安装完成.然后进行表的创建和插入数据.如图. 然后下载sqoop和连接mysql数据库的jar包.接下来是安装sqoop.首先是配置sqoop-env.sh文件.如图. 然后将config-sqoop文件中不需要检查的注释掉.如图. 然后接下来是将sqoop-1.4.4.jar包和连接mysql的jar包copy到hadoop目录下的lib目录中,同时把hadoop-core-1.2.1.

HBase数据的导入和导出

查阅了几篇中英文资料,发现有的地方说的不是很全部,总结在此,共有两种命令行的方式来实现数据的导入导出功能,即备份和还原. 1 HBase本身提供的接口其调用形式为: 1)导入 ./hbase org.apache.hadoop.hbase.mapreduce.Driver import 表名数据文件位置其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径. 当其为前者时,直接指定即可,也可以加前缀file:/// 而当其伟后者时,必须明确指明hdfs的路径,例如hdf

Hive三种不同的数据导出的方式

转自:http://blog.chinaunix.net/uid-27177626-id-4653808.html Hive三种不同的数据导出的方式,根据导出的地方不一样,将这些方法分为三类:(1)导出到本地文件系统 (2)导出到HDFS (3)导出到hive的另一个表中导出到本地文件系统执行: hive> insert overwrite local directory '/root/student' > select * from student; 通过insert overwrite

Hbase数据导入导出

平时用于从生产环境hbase到导出数据到测试环境. 导入数据: import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.

HBase表的数据导出和导入

1. 表数据导出 hbase org.apache.hadoop.hbase.mapreduce.Export test file:///home/hadoop/test (导入到本地) hbase org.apache.hadoop.hbase.mapreduce.Export test /user/hadoop/test (导入到hdfs上) #创建一个test表,一个列簇info hbase(main):004:0* create 'test','info' 0 row(s) in 4.3

使用MapReduce查询Hbase表指定列簇的全部数据输入到HDFS（一）

package com.bank.service; import java.io.IOException; import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hba

使用Sqoop1.4.4将MySQL数据库表中数据导入到HDFS中

问题导读: 1.--connect参数作用? 2.使用哪个参数从控制台读取数据库访问密码? 3.Sqoop将关系型数据库表中数据导入HDFS基本参数要求及命令? 4.数据默认导入HDFS文件系统中的路径? 5.--columns参数的作用? 6.--where参数的作用? 一.部分关键参数介绍参数介绍 --connect <jdbc-uri> 指定关系型数据库JDBC连接字符串 --connection-manager <class-name> 指定数据库使用的管

HBase数据导入的几种操作

数据导入有如下几种方式: 1．利用HBase提供的ImportTsv将csv文件导入到HBase 2．利用HBase提供的completebulkload将数据导入到HBase 3．利用HBase提供的Import将数据导入到HBase 利用ImportTsv将csv文件导入到HBase 命令: 格式:hbase [类] [分隔符] [行键,列族] [表] [导入文件] bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimportt