使用Sqoop1.4.4将MySQL数据库表中数据导入到HDFS中

问题导读:

        1、--connect参数作用?

2、使用哪个参数从控制台读取数据库访问密码?

3、Sqoop将关系型数据库表中数据导入HDFS基本参数要求及命令?

4、数据默认导入HDFS文件系统中的路径?

5、--columns参数的作用?

6、--where参数的作用?

一、部分关键参数介绍

参数介绍
--connect <jdbc-uri>	             指定关系型数据库JDBC连接字符串
--connection-manager <class-name>    指定数据库使用的管理类
--driver <class-name>	             手动指定要使用JDBC驱动程序类
--hadoop-mapred-home <dir>	     重写覆盖$HADOOP_MAPRED_HOME
--help	 			     使用提示帮助提示
--password-file	                     设置包含身份验证密码的路径文件
-P	                             从控制台读取数据库访问密码
--password <password>	             设置数据库身份验证密码
--username <username>	             设置数据库访问用户
--verbose	                     打印更多程序执行流程信息
--connection-param-file <filename>   用于提供连接参数的可选的属性文件

二、要导出的MySQL数据库

[[email protected] ~]$ mysql -uhive -phive spice
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 419
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from users;
+----+----------+----------+-----+---------+------------+-------+------+
| id | username | password | sex | content | datetime   | vm_id | isad |
+----+----------+----------+-----+---------+------------+-------+------+
| 56 | hua      | hanyun   | 男  | 开通    | 2013-12-02 |     0 |    1 |
| 58 | feng     | 123456   | 男  | 开通    | 2013-11-22 |     0 |    0 |
| 59 | test     | 123456   | 男  | 开通    | 2014-03-05 |    58 |    0 |
| 60 | user1    | 123456   | 男  | 开通    | 2014-06-26 |    66 |    0 |
| 61 | user2    | 123      | 男  | 开通    | 2013-12-13 |    56 |    0 |
| 62 | user3    | 123456   | 男  | 开通    | 2013-12-14 |     0 |    0 |
| 64 | kai.zhou | 123456   | ?   | ??      | 2014-03-05 |    65 |    0 |
+----+----------+----------+-----+---------+------------+-------+------+
7 rows in set (0.00 sec)

三、将上面数据库users表中数据导入到HDFS中

执行导入命令,最少要指定数据库连接字符串、访问用户名、访问密码和要导入的表名,默认情况下会将数据导入到HDFS根目录下的/user/hadoopUser/<表名>/目录下,也可以使用--target-dir参数,指定导入目录。如下:

[[email protected] ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive --table users  --target-dir /output/sqoop/
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/01/17 20:28:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/01/17 20:28:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/01/17 20:28:16 INFO tool.CodeGenTool: Beginning code generation
15/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
15/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
15/01/17 20:28:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0
Note: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/01/17 20:28:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.jar
15/01/17 20:28:18 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/01/17 20:28:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/01/17 20:28:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/01/17 20:28:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/01/17 20:28:18 INFO mapreduce.ImportJobBase: Beginning import of users
15/01/17 20:28:18 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hbase/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/01/17 20:28:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/01/17 20:28:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/01/17 20:28:19 INFO client.RMProxy: Connecting to ResourceManager at secondmgt/192.168.2.133:8032
15/01/17 20:28:20 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `users`
15/01/17 20:28:20 INFO mapreduce.JobSubmitter: number of splits:4
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
15/01/17 20:28:20 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/01/17 20:28:20 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/01/17 20:28:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421373857783_0002
15/01/17 20:28:21 INFO impl.YarnClientImpl: Submitted application application_1421373857783_0002 to ResourceManager at secondmgt/192.168.2.133:8032
15/01/17 20:28:21 INFO mapreduce.Job: The url to track the job: http://secondmgt:8088/proxy/application_1421373857783_0002/
15/01/17 20:28:21 INFO mapreduce.Job: Running job: job_1421373857783_0002
15/01/17 20:28:34 INFO mapreduce.Job: Job job_1421373857783_0002 running in uber mode : false
15/01/17 20:28:34 INFO mapreduce.Job:  map 0% reduce 0%
15/01/17 20:28:44 INFO mapreduce.Job:  map 25% reduce 0%
15/01/17 20:28:49 INFO mapreduce.Job:  map 75% reduce 0%
15/01/17 20:28:54 INFO mapreduce.Job:  map 100% reduce 0%
15/01/17 20:28:54 INFO mapreduce.Job: Job job_1421373857783_0002 completed successfully
15/01/17 20:28:54 INFO mapreduce.Job: Counters: 27
       File System Counters
              FILE: Number of bytes read=0
                FILE: Number of bytes written=368040
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=401
                HDFS: Number of bytes written=288
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=174096
                Total time spent by all reduces in occupied slots (ms)=0
        Map-Reduce Framework
                Map input records=7
                Map output records=7
                Input split bytes=401
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=205
                CPU time spent (ms)=10510
                Physical memory (bytes) snapshot=599060480
                Virtual memory (bytes) snapshot=3535720448
                Total committed heap usage (bytes)=335544320
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=288
15/01/17 20:28:54 INFO mapreduce.ImportJobBase: Transferred 288 bytes in 35.2792 seconds (8.1635 bytes/sec)
15/01/17 20:28:54 INFO mapreduce.ImportJobBase: Retrieved 7 records.

四、查看导入HDFS上的数据

[had[email protected] ~]$ hadoop fs -cat /output/sqoop/*
56,hua,hanyun,男,开通,2013-12-02,0,1
58,feng,123456,男,开通,2013-11-22,0,0
59,test,123456,男,开通,2014-03-05,58,0
60,user1,123456,男,开通,2014-06-26,66,0
61,user2,123,男,开通,2013-12-13,56,0
62,user3,123456,男,开通,2013-12-14,0,0
64,kai.zhou,123456,?,??,2014-03-05,65,0

与原数据库中记录一样,导入成功。

五、选择部分数据导入

1、--columns参数指定列

Sqoop默认是将表中每条记录的所有属性值导入,有的时候,我们只需要导入部分属性值,这时可以使用--columns参数,指定需要导入的列名,多个列之间用逗号隔开。如下将users表中的用户名、性别和时间导入到HDFS中:

[[email protected] ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive 	> --table users --columns "username,sex,datetime" --target-dir /output/sqoop/

查看结果:

[[email protected] ~]$ hadoop fs -cat /output/sqoop/*
hua,男,2013-12-02
feng,男,2013-11-22
test,男,2014-03-05
user1,男,2014-06-26
user2,男,2013-12-13
user3,男,2013-12-14
kai.zhou,?,2014-03-05

2、--where参数过滤行

另一个参数--where,可以对行做过滤,得到部分符合条件的记录,而不是表中全部记录。如下,将users表中id值大于60的数据导入到HDFS中:

[[email protected] conf]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive \
 > --table users  --where " id > 60"  --target-dir /output/sqoop/

查看结果:

[[email protected] conf]$ hadoop fs -cat /output/sqoop/*
61,user2,123,男,开通,2013-12-13,56,0
62,user3,123456,男,开通,2013-12-14,0,0
64,kai.zhou,123456,?,??,2014-03-05,65,0

推荐阅读:

 上一篇:Sqoop1.4.4在Hadoop2.2.0集群上的安装

下一篇:Sqoop使用SQL语句形式将MySQL数据库表中数据导入到HDFS中

时间: 2024-10-01 03:09:50

使用Sqoop1.4.4将MySQL数据库表中数据导入到HDFS中的相关文章

MySQL数据库表的数据插入、修改、删除操作及实例应用

1.MySQL数据库表的数据插入.修改.删除操作SQL语法格式: 1 CREATE DATABASE db0504; 2 3 USE db0504; 4 5 CREATE TABLE student ( 6 sno VARCHAR (10) NOT NULL UNIQUE PRIMARY KEY, 7 sname VARCHAR (20) NOT NULL, 8 ssex enum ('男', '女') NOT NULL DEFAULT '男', 9 sbirth date, 10 sclass

Sqoop1.4.4将MySQL数据库表中数据导入到HBase表中

问题导读:         1.--hbase-table.--hbase-row-key.--column-family及--hbase-create-table参数的作用? 2.Sqoop将关系型数据库表中数据导入HBase中,默认Rowkey是什么? 3.如果关系型数据库表中存在多关键字,该怎么办? 一.简介及部分重要参数介绍 Sqoop除了能够将数据从关系型数据库导入到HDFS和Hive中,还能够导入到HBase表中. --hbase-table:通过指定--hbase-table参数值

python实现自动从mysql数据库取指定数据记录到excel中-新建、追加

xlsxwriter,openpyxl,pandas 模块都可以实现往excel中写入数据,但是为了更简单方便的实现我的需求,选择将三种结合使用. #!/usr/bin/env python3 # -*-coding: utf-8 -*- # @Time:2019/12/26 16:55 # @Author: WSN import pandas as pd import pymysql, openpyxl, os, xlsxwriter # 设定excel文件名称 version = 'V1.4

linux 导出mysql数据库表及数据

导出表及结构 mysqldump -h地址ip -u用户名 -p密码 数据库名 > 数据库名.sql --连接数据库 mysql  -h地址ip -u用户名 -p密码 --使用数据库 user 库名: --执行sql语句即可 select 字段  from 表名: 原文地址:https://www.cnblogs.com/lingduqianli/p/9123500.html

mysql旧表去重数据导入新表且赋予主键id新值

业务需求: A表有id,n1,n2,n3字段,为新建空表, B表有id,n1,n2,n3,n4,n5等字段,为含有数据的旧表, 现将B表中的n1,n2,n3组合的记录去重后导入A表中,并且A表主键也要录入值 方案:将B表的记录查询后去重,同时对查记录结果编排序号,将序号作为待插入A表的id值,sql语句如下: INSERT INTO A表(id,n1,n2,n3) select @row :[email protected]row+1 AS id,t1.* from ( SELECT DISTI

使用Sqoop将MySql数据导入到HDFS

##以上完成后在h3机器上配置sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz //将宿主机上MySql的test库中的users表的数据导入到HDFS,默认Sqoop会起4个Map运行MapReduce进行导入到HDFS,存放在HDFS路径为/user/root/users(user:默认用户,root:MySql数据库的用户,test:表名)目录下有四个输出文件sqoop import --connect jdbc:mysql://192.168.1.10

将Hive统计分析结果导入到MySQL数据库表中(一)——Sqoop导入方式

最近在做一个交通流的数据分析,需求是对于海量的城市交通数据,需要使用MapReduce清洗后导入到HBase中存储,然后使用Hive外部表关联HBase,对HBase中数据进行查询.统计分析,将分析结果保存在一张Hive表中,最后使用Sqoop将该表中数据导入到MySQL中.整个流程大概如下: 下面我主要介绍Hive关联HBase表--Sqoop导出Hive表到MySQL这些流程,原始数据集收集.MapReduce清洗及WEB界面展示此处不介绍. 一.HBase数据库表 hbase(main):

查看和改动MySQL数据库表存储引擎

要做一名合格的程序猿,除了把代码写的美丽外,熟知数据库方面的知识也是不可或缺的.以下总结一下怎样查看和改动MySQL数据库表存储引擎:        1.查看数据库所能支持的存储引擎:show engines;        2.查看某个数据库中某个表所使用的存储引擎:show table status from db_name where name='table_name';(注:将where条件去掉后能够查看某个数据库中全部表的存储引擎情况)        3.改动表引擎方法:alter t

mysql数据库表操作及授权

表操作:增删改查 把/etc/passwd文件的内容导入 passwd表里. mysql>load data infile"/etc/passwd" into table passwd fields terminated by ":"; 基于前面的passwd表,完成下列操作: 1:列出uid低于500且3个字母的用户 mysql> select name from passwd where uid<500 and name like "