一、环境搭建
1.Hadoop
http://my.oschina.net/u/204498/blog/519789
2.Sqoop2.x
http://my.oschina.net/u/204498/blog/518941
3. mysql
二、从mysql导入hdfs
1.创建mysql数据库、表、以及测试数据
xxxxxxxx$ mysql -uroot -p Enter password: mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | test | +--------------------+ 4 rows in set (0.00 sec) test => 是新建的数据库 mysql> use test; mysql> show tables; +----------------------+ | Tables_in_test | +----------------------+ | | test | +----------------------+ 1 rows in set (0.00 sec) test => 是新增的表 mysql> desc test; +-------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(45) | YES | | NULL | | | age | int(11) | YES | | NULL | | +-------+-------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> select * from test; +----+------+------+ | id | name | age | +----+------+------+ | 7 | a | 1 | | 8 | b | 2 | | 9 | c | 3 | +----+------+------+ 3 rows in set (0.00 sec)
2. 为各个用户授权
注意:sqoop提交job后,各个节点在map阶段会访问数据库,所以需事先授权
mysql> grant [all | select | ...] on {db}.{table} to {user}@{host} identified by {passwd}; mysql> flush privileges; #我给特定的hostname授权 username:root passwd:root 访问db:test 中任意table,权限是all mysql> grant all on test.* to ‘root‘@{host} identified by ‘root‘;
3.启动sqoop2-server
[[email protected] sqoop-1.99.6-bin-hadoop200]$ pwd /home/hadoop/sqoop-1.99.6-bin-hadoop200 [[email protected] sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-server start ... ... webui可以访问校验,也可以查看log
4.启动sqoop2-shell
[[email protected] sqoop-1.99.6-bin-hadoop200]$ pwd /home/hadoop/sqoop-1.99.6-bin-hadoop200 [[email protected] sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-shell ... ... sqoop:000> show version ... ... sqoop:000> show connector +----+------------------------+---------+------------------------------------------------------+----------------------+ | Id | Name | Version | Class | Supported Directions | +----+------------------------+---------+------------------------------------------------------+----------------------+ | 1 | generic-jdbc-connector | 1.99.6 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO | | 2 | kite-connector | 1.99.6 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO | | 3 | hdfs-connector | 1.99.6 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO | | 4 | kafka-connector | 1.99.6 | org.apache.sqoop.connector.kafka.KafkaConnector | TO | +----+------------------------+---------+------------------------------------------------------+----------------------+ 根据你的connector创建connector sqoop:000> create link -c 1 => 先创建jdbc 会填写name、jdbc-driver、url、username、passwd等等 sqoop:000> create link -c 3 => 创建hdfs 会填写name、hdfs url、等等 sqoop:000> show link +----+-------------+--------------+------------------------+---------+ | Id | Name | Connector Id | Connector Name | Enabled | +----+-------------+--------------+------------------------+---------+ | 3 | 10-21_jdbc1 | 1 | generic-jdbc-connector | true | | 4 | 10-21_hdfs1 | 3 | hdfs-connector | true | +----+-------------+--------------+------------------------+---------+ 创建job -f=> from -t to 即从哪些导入到哪里 sqoop:000> create job -f 3 -t 4 会填写,相应的table信息。还有hdfs信息 sqoop:000> show job +----+---------------+----------------+--------------+---------+ | Id | Name | From Connector | To Connector | Enabled | +----+---------------+----------------+--------------+---------+ | 1 | 10-20_sqoopy2 | 1 | 3 | true | +----+---------------+----------------+--------------+---------+ #启动job sqoop:000> start job -j 2 ... ... ... 可以再webui上访问到,查看进度,也可以使用 sqoop:000> status job -j 2
sqoop的guide
5.troubleshooting
多看日志,慢慢的排查
时间: 2024-10-12 07:57:10