一、Sqoop介绍
Sqoop是一个用来将Hadoop(Hive、HBase)和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如:MySQL ,Oracle ,Postgres等)中的数据导入到Hadoop的HDFS中,也可以将HDFS的数据导入到关系型数据库中。
Sqoop目前已经是Apache的顶级项目了,目前版本是1.4.4 和 Sqoop2 1.99.3,本文以1.4.4的版本为例讲解基本的安装配置和简单应用的演示。
版本为:
sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
- 环境变量配置
2.Sqoop参数配置
# Set Hadoop-specific environment variables here. #Set path to where bin/hadoop is available #export HADOOP_COMMON_HOME= #Set path to where hadoop-*-core.jar is available #export HADOOP_MAPRED_HOME= #set the path to where bin/hbase is available #export HBASE_HOME= #Set the path to where bin/hive is available #export HIVE_HOME=
3.驱动jar包
下面测试演示以MySQL为例,则需要把mysql对应的驱动lib文件copy到<SQOOP_HOME>/lib 目录下。
4.Mysql中测试数据
CREATE TABLE `demo_blog` ( `id` int(11) NOT NULL AUTO_INCREMENT, `blog` varchar(100) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; CREATE TABLE `demo_log` ( `operator` varchar(16) NOT NULL, `log` varchar(100) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 插入测试数据: insert into demo_blog (id, blog) values (1, "micmiu.com"); insert into demo_blog (id, blog) values (2, "ctosun.com"); insert into demo_blog (id, blog) values (3, "baby.micmiu.com"); insert into demo_log (operator, log) values ("micmiu", "create"); insert into demo_log (operator, log) values ("micmiu", "update"); insert into demo_log (operator, log) values ("michael", "edit"); insert into demo_log (operator, log) values ("michael", "delete");
二 .Sqoop命令操作
1.Sqoop基本命令
(1)列出Mysql中的数据库
sqoop list-databases --connect jdbc:mysql://Master-Hadoop:3306 --username root --password rootroot
(2)列出test数据库中所有的表
sqoop list-databases --connect jdbc:mysql://Master-Hadoop:3306 --username root --password rootroot
(3)从Mysql导入HDFS文件中
sqoop import --connect jdbc:mysql://Master-Hadoop:3306/test --username root --password rootroot --table demo_log --split-by operator --target-dir /usr/sqoop/other