本文使用的数据库是mysql的sample database employees.
download url:https://launchpad.net/test-db/employees-db-1/1.0.6
然后根据ReadMe安装到自己的mysql数据库中。
sqoop的安装:
下载地址:http://apache.dataguru.cn/sqoop/1.4.6/
sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
sqoop-1.4.6.tar.gz
我的hadoop版本是2.6,所以我下载的是这两个版本。
sqoop-1.4.6.tar.gz不能直接使用,需要编译。编译后把生成的
sqoop-1.4.6.jar 复制到 sqoop_home/lib下面即可。
复制jdbc connector
mysql-connector-java-5.1.32-bin.jar 或版本更高的mysql jdbc connector.
否则会出现某些bug.
sqoop list-tables --connect jdbc:mysql://namenode01:3306/employees --username hive --password hive mysqlurl=jdbc:mysql://namenode01:3306/employees sqoop import --connect $mysqlurl --username hive --password hive --table departments --target-dir /etl/input/departments hdfs dfs -cat /etl/input/departments/* insert into departments values(‘d9999‘,‘Evan,Test‘); Overriding Type Mapping --map-column-java c1=Float,c2=String,c3=String ... Sqoop by default uses four concurrent map tasks to transfer data to Hadoop. mysqlurl=jdbc:mysql://namenode01:3306/employees sqoop import --connect $mysqlurl --username hive --password hive --query ‘select d.dept_no,d.dept_name,de.from_date, de.to_date, e.* from employees e join dept_emp de on e.emp_no=de.emp_no join departments d on de.dept_no=d.dept_no WHERE $CONDITIONS‘ --split-by d.dept_no --target-dir /etl/input/employees --compare the result count from mysql and hdfs file hdfs dfs -cat /etl/input/employees/* | wc -l if you want to overwrite the data type, you can specify the column=type --map-column-hive id=STRING,price=DECIMAL mysqlurl=jdbc:mysql://namenode01:3306/employees sqoop import --connect $mysqlurl --username hive --password hive --query ‘select d.dept_no,d.dept_name,de.from_date, de.to_date, e.* from employees e join dept_emp de on e.emp_no=de.emp_no join departments d on de.dept_no=d.dept_no WHERE $CONDITIONS‘ --split-by d.dept_no --hive-import --hive-table test.employees --hive-drop-import-delims --null-string ‘\\N‘ --null-non-string ‘\\N‘ --target-dir /tmp/employees
时间: 2024-10-10 04:28:10