[hadoop读书笔记] 第十五章 sqoop1.4.6小实验 - 数据在mysq和hdfs之间的相互转换

P573 从mysql导入数据到hdfs

第一步:在mysql中创建待导入的数据

1、创建数据库并允许所有用户访问该数据库


mysql -h 192.168.200.250 -u root -p

CREATE DATABASE sqoop;

GRANT ALL PRIVILEGES ON *.* TO ‘root‘@‘%‘;
或 GRANT SELECT, INSERT, DELETE,UPDATE ON *.* TO ‘root‘@‘%‘;
FLUSH PRIVILEGES;
查看权限:select user,host,select_priv,insert_priv,update_priv,delete_priv from mysql.user;

2、创建表widgets

CREATE TABLE widgets(id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
widget_name VARCHAR(64) NOT NULL,
price DECIMAL(10,2),
design_date DATE,
version INT,
design_comment VARCHAR(100));

3、导入测试数据

INSERT INTO widgets VALUES(NULL,‘sprocket‘,0.25,‘2010-01-10‘,1,‘connect two gizmos‘);
INSERT INTO widgets VALUES(NULL,‘gizmo‘,4.00,‘2009-01-30‘,4,NULL);
INSERT INTO widgets VALUES(NULL,‘gadget‘,99.99,‘1983-08-13‘,13,‘our flagship product‘);

第二步:执行sqoop导入命令

sqoop import --connect jdbc:mysql://192.168.200.250/sqoop --table widgets -m 1

缺少mysql连接器

先导入mysql的连接器包

再来执行

发现怎么也连接不上远程mysql数据库,需要授权如下:



GRANT ALL ON *.* TO ‘‘@‘192.168.200.123‘;grant all privileges on *.* to ""@"192.168.200.123" identified by "密码";
FLUSH PRIVILEGES;select user,host,select_priv,insert_priv,update_priv,delete_priv from mysql.user;

再来执行一下

还是不行的话,就只能是在sqoop命令中通过--username 和--password来显式的指定用户名和密码连接了

sqoop import --connect jdbc:mysql://192.168.200.250/sqoop --table widgets -m 1 -username root -password mysql密码

在yarn管理台查看到这个任务正在运行(RUNNING)http://hadoop-allinone-200-123.wdcloud.locl:8088/cluster

但是最终还是执行失败

失败原因:物理内存使用了156.8远小于分配的1GB,但是虚拟内存使用2.7超过了默认配置的2.1GB,解决方法:

在etc/hadoop/yarn-site.xml文件中,修改检查虚拟内存的属性为false,如下:

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>  

运行继续报错:

解决方法:这个目录没有权限

http://www.oschina.net/question/2288283_2134188?sort=time

保证使用hadoop用户启动集群(因为hadoop的集群的用户是hadoop),并为这个文件夹授权755

再来执行,姐们儿就不信了 。。。哒哒哒。。。终于成功了

后台日志:

[[email protected]200-123 sqoop-1.4.6]$ sqoop import --connect jdbc:mysql://192.168.200.250/sqoop --tabgets -m 1 -username root -password weidong
Warning: /wdcloud/app/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /wdcloud/app/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /wdcloud/app/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /wdcloud/app/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/01/23 23:59:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/01/23 23:59:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider us instead.
17/01/23 23:59:18 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/23 23:59:18 INFO tool.CodeGenTool: Beginning code generation
17/01/23 23:59:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets` AS t LIMIT 1
17/01/23 23:59:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets` AS t LIMIT 1
17/01/23 23:59:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /wdcloud/app/hadoop-2.7.3
Note: /tmp/sqoop-hadoop/compile/591fd797fbbe57ce38b4492a1c9a0300/widgets.java uses or overrides a deprecated
Note: Recompile with -Xlint:deprecation for details.
17/01/23 23:59:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/591fd797fbbe57ce381c9a0300/widgets.jar
17/01/23 23:59:21 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/01/23 23:59:21 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/01/23 23:59:21 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/01/23 23:59:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/01/23 23:59:21 INFO mapreduce.ImportJobBase: Beginning import of widgets
17/01/23 23:59:21 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.joer.address
17/01/23 23:59:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/01/23 23:59:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.
17/01/23 23:59:24 INFO client.RMProxy: Connecting to ResourceManager at hadoop-allinone-200-123.wdcloud.locl/8.200.123:8032
17/01/23 23:59:30 INFO db.DBInputFormat: Using read commited transaction isolation
17/01/23 23:59:30 INFO mapreduce.JobSubmitter: number of splits:1
17/01/23 23:59:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485230213604_0001
17/01/23 23:59:32 INFO impl.YarnClientImpl: Submitted application application_1485230213604_0001
17/01/23 23:59:32 INFO mapreduce.Job: The url to track the job: http://hadoop-allinone-200-123.wdcloud.locl:80213604_0001/
17/01/23 23:59:32 INFO mapreduce.Job: Running job: job_1485230213604_0001
17/01/23 23:59:50 INFO mapreduce.Job: Job job_1485230213604_0001 running in uber mode : false
17/01/23 23:59:50 INFO mapreduce.Job:  map 0% reduce 0%
17/01/24 00:00:00 INFO mapreduce.Job:  map 100% reduce 0%
17/01/24 00:00:01 INFO mapreduce.Job: Job job_1485230213604_0001 completed successfully
17/01/24 00:00:02 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=138186
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=129
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=7933
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=7933
        Total vcore-milliseconds taken by all map tasks=7933
        Total megabyte-milliseconds taken by all map tasks=8123392
    Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=59
        CPU time spent (ms)=2210
        Physical memory (bytes) snapshot=190287872
        Virtual memory (bytes) snapshot=2924978176
        Total committed heap usage (bytes)=220725248
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=129
17/01/24 00:00:02 INFO mapreduce.ImportJobBase: Transferred 129 bytes in 38.2028 seconds (3.3767 bytes/sec)
17/01/24 00:00:02 INFO mapreduce.ImportJobBase: Retrieved 3 records.

查看作业历史服务器以了解MR任务执行详情,发现查看不到,原因是因为没有启动作业历史服务器

启动之:

再来查看下,就可以看到作业历史记录了

http://hadoop-allinone-200-123.wdcloud.locl:19888/jobhistory/job/job_1485230213604_0001

可以看到,sqoop导入数据到hdfs只有map任务而没有reduce任务,map任务数目为1,执行完成数目为1,成功数目为1 ,点击Map链接,查看详细

现在,看看是否真的已经导入了这个数据表

第三步:验证导入结果

可以看到 widgets 表的数据已经导入到了HDFS中

除了导入数据到HDFS中,sqoop在导入时还生成导入源代码.java .jar和.class文件

如果只想生成代码而不导入数据,执行以下命令:

sqoop codegen --connect uri --table 表 --class-name 生成的类名称

第四步:追加数据

--direct:能更快速的从表中读取数据,需要数据库支持,如mysql使用外部工具mysqldump
--append:使用追加数据模式来导入数据

现在,我们在mysql中新插入了一条数据

来执行追加命令

sqoop import --connect jdbc:mysql://192.168.200.250/sqoop --table widgets -m 1 -username root -password weidong --direct --append

执行成功

查看下HDFS中的数据

可以看到,已经追加成功

第五步:将HDFS中的数据导出到mysql

复制表widgets为widgets_copy并清空widgets_copy表数据

执行导出命令

当将密码写在命令行,会为安全造成影响,这时,可以使用参数-P取代 --password

在任务执行时动态的输入密码

Setting your password on the command-line is insecure. Consider using -P instead.

所以命令如下:

 sqoop export --connect jdbc:mysql://192.168.200.250/sqoop -m 1 --table widgets_copy --export-dir widgets/part-m-00002  --username root -P

Enter password:不会回显字符

成功执行日志信息

[[email protected]200-123 /]$ sqoop export --connect jdbc:mysql://192.168.200.250/sqoop -m 1 --table widgets_copy --export-dir widgets/part-m-00002  --username root -P17/01/24 01:04:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
17/01/24 01:04:22 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/24 01:04:22 INFO tool.CodeGenTool: Beginning code generation
17/01/24 01:04:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets_copy` AS t LIMIT 1
17/01/24 01:04:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets_copy` AS t LIMIT 1
17/01/24 01:04:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /wdcloud/app/hadoop-2.7.3
Note: /tmp/sqoop-hadoop/compile/c66df558e872801e493fbc78458e6914/widgets_copy.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/24 01:04:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/c66df558e872801e493fbc78458e6914/widgets_copy.jar
17/01/24 01:04:26 INFO mapreduce.ExportJobBase: Beginning export of widgets_copy
17/01/24 01:04:26 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
17/01/24 01:04:26 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/01/24 01:04:28 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/01/24 01:04:28 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/01/24 01:04:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/01/24 01:04:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-allinone-200-123.wdcloud.locl/192.168.200.123:8032
17/01/24 01:04:30 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1281)
    at java.lang.Thread.join(Thread.java:1355)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
17/01/24 01:04:32 INFO input.FileInputFormat: Total input paths to process : 1(仅处理一个路径的数据导出)
17/01/24 01:04:32 INFO input.FileInputFormat: Total input paths to process : 1
17/01/24 01:04:32 INFO mapreduce.JobSubmitter: number of splits:1
17/01/24 01:04:32 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/01/24 01:04:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485230213604_0005
17/01/24 01:04:34 INFO impl.YarnClientImpl: Submitted application application_1485230213604_0005
17/01/24 01:04:34 INFO mapreduce.Job: The url to track the job: http://hadoop-allinone-200-123.wdcloud.locl:8088/proxy/application_1485230213604_0005/
17/01/24 01:04:34 INFO mapreduce.Job: Running job: job_1485230213604_0005
17/01/24 01:04:46 INFO mapreduce.Job: Job job_1485230213604_0005 running in uber mode : false
17/01/24 01:04:46 INFO mapreduce.Job:  map 0% reduce 0%
17/01/24 01:04:57 INFO mapreduce.Job:  map 100% reduce 0%
17/01/24 01:04:58 INFO mapreduce.Job: Job job_1485230213604_0005 completed successfully
17/01/24 01:04:59 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=137897
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=334
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=7444
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=7444
        Total vcore-milliseconds taken by all map tasks=7444
        Total megabyte-milliseconds taken by all map tasks=7622656
    Map-Reduce Framework
        Map input records=4
        Map output records=4
        Input split bytes=162
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=149
        CPU time spent (ms)=2890
        Physical memory (bytes) snapshot=184639488
        Virtual memory (bytes) snapshot=2923687936
        Total committed heap usage (bytes)=155713536
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
17/01/24 01:04:59 INFO mapreduce.ExportJobBase: Transferred 334 bytes in 30.6866 seconds (10.8842 bytes/sec)
17/01/24 01:04:59 INFO mapreduce.ExportJobBase: Exported 4 records.(导出了4条记录)

可以看见,mysql表已导入数据

至此,mysql和hdfs相互的数据导入导出就完毕了

时间: 2024-08-28 02:56:46

[hadoop读书笔记] 第十五章 sqoop1.4.6小实验 - 数据在mysq和hdfs之间的相互转换的相关文章

[hadoop读书笔记] 第十五章 sqoop1.4.6小实验 - 将mysq数据导入hive

安装hive 1.下载hive-2.1.1(搭配hadoop版本为2.7.3) 2.解压到文件夹下 /wdcloud/app/hive-2.1.1 3.配置环境变量 4.在mysql上创建元数据库hive_metastore编码选latin,并授权 grant all on hive_metastore.* to 'root'@'%' IDENTIFIED BY 'weidong' with grant option; flush privileges; 5.新建hive-site.xml,内容

Hadoop读书笔记(十四)MapReduce中TopK算法(Top100算法)

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 (系列文章会逐步修整完成,添加数据文件格式预计相关注释) 1.说明: 从给定的文件中的找到最大的100个值,给定的数据文件格式如下: 533 16565 17800 2929 11374 9826 6852 20679 18224 21222 8227 5336 912 29525 3382 2100 10673 12284 31634 27405 1

Hadoop读书笔记(十)MapReduce中的从计数器理解combiner归约

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.combiner 问:什么是combiner: 答:Combiner发生在Mapper端,对数据进行归约处理,使传到reducer端的数据变小了,传输时间变端,作业时间变短,Combiner不能夸Mapper执行,(只有reduce可以接受多个Mapper的任务). 并不是所有的算法都适合归约处理,例如求平均数 2.代码实现 WordCount.j

Hadoop读书笔记(十二)MapReduce自定义排序

Hadoop读书笔记系列文章:http://blog.csdn.net/caicongyang/article/category/2166855 1.说明: 对给出的两列数据首先按照第一列升序排列,当第一列相同时,第二列升序排列 数据格式: 3 3 3 2 3 1 2 2 2 1 1 1 2.代码 SortApp.java package sort; import java.io.DataInput; import java.io.DataOutput; import java.io.IOExc

《android开发艺术探索》读书笔记(十五)--Android性能优化

接上篇<android开发艺术探索>读书笔记(十四)--JNI和NDK编程 No1: 如果<include>制定了这个id属性,同时被包含的布局文件的根元素也制定了id属性,那么以<include>指定的id属性为准 No2: 绘制优化 1)onDraw中不要创建新的局部对象 2)onDraw方法中不要做耗时的任务 No3: 内存泄露优化 场景一:静态变量导致的内存泄露: 如果静态变量持有了一个Activity,会导致Activity无法及时释放. 解决办法:1使用Ap

WPF,Silverlight与XAML读书笔记第十五 - 页间导航 页间数据传递

?说明:本系列基本上是<WPF揭秘>的读书笔记.在结构安排与文章内容上参照<WPF揭秘>的编排,对内容进行了总结并加入一些个人理解. 导航 有关导航的话题在介绍NavigationWindow与Page等元素时有提及.这篇文章将详细分析导航相关话题.同其它话题,针对WPF,Silverlight与WP 7,导航特性大致相似又有着些许不同.在介绍此特性时相同的特性将合在一起,每个框架独有的特性也将独立介绍并有明显标识. 导航的功能及目的就是从一个页面转向另一个页面,可能是前进亦或是后

MYSQL必知必会读书笔记 第十五和十六章 联结表

为什么要使用联结? 如果数据存储在多个表中,怎样使用单条SELECT语句检索出数据?答案就是使用联结.简单地说,可以联结多个表返回一组输出,联结在运行时关联表中正确的行. 1.创建联结 SELECT vend_name,prod_name,prod_price from vendors,products WHERE vendors.vend_id=products.vend_id ORDER BY vend_name,prod_name; 注意:在引用列可能出现二义性时,必须使用完全限定列名.

《APUE》读书笔记第十二章-线程控制

本章中,主要是介绍控制线程行为方面的内容,同时介绍了在同一进程中的多个线程之间如何保持数据的私有性以及基于进程的系统调用如何与线程进行交互. 一.线程属性 我们在创建线程的时候可以通过修改pthread_attr_t结构的值来修改线程的属性,将这些属性与创建的线程联系起来.调用pthread_attr_init以后,pthread_attr_t结构所包含的内容就是操作系统实现支持的线程所有属性. #include <pthread.h> int pthread_attr_init(pthrea

《Programming in Lua 3》读书笔记(二十五)

日期:2014.8.11 PartⅣ The C API 29 User-Defined Types in C 在之前的例子里,已经介绍过如果通过用C写函数来扩展Lua.在本章,将会介绍通过用C写新的类型来扩展Lua,将会使用到元方法等特性来实现这个功能. 以一个例子来介绍本章将要介绍的,例子实现的功能是实现了一个简单的类型:boolean arrays.实现这个功能主要是这种方法不需要太复杂的算法,因此可以将精力放在API的讨论上.当然我们可以在Lua中用一个table来实现,但是用一个C来实