Sqoop导入到hdfs

1.注意win下直接复制进linux 改一下--等

sqoop-list-databases --connect jdbc:mysql://122.206.79.212:3306/ --username root -P

  

先看一下有什么数据库,发现有些数据库,能查询到的数据库才能导入,很奇怪。

2.导入到hdfs

sqoop import  --connect jdbc:mysql://122.206.79.212:3306/dating --username root --password 123456 --table t_rec_top --driver com.mysql.jdbc.Driver

  那个数据库 端口号 账户名 密码 那个表 不需要加上驱动  那没指定导入到hdfs的哪,肯定会有默认位置的

可以看出只有map任务 没有reduce任务

Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/03/15 11:05:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/03/15 11:05:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/03/15 11:05:12 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/03/15 11:05:12 INFO manager.SqlManager: Using default fetchSize of 1000
17/03/15 11:05:12 INFO tool.CodeGenTool: Beginning code generation
17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
17/03/15 11:05:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/mapreduce
Note: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/03/15 11:05:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.jar
17/03/15 11:05:18 INFO mapreduce.ImportJobBase: Beginning import of t_rec_top
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/03/15 11:05:19 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/03/15 11:05:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
17/03/15 11:05:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/03/15 11:05:21 INFO client.RMProxy: Connecting to ResourceManager at CentOSMaster/192.168.58.180:8032
17/03/15 11:05:28 INFO db.DBInputFormat: Using read commited transaction isolation
17/03/15 11:05:28 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM t_rec_top
17/03/15 11:05:28 INFO mapreduce.JobSubmitter: number of splits:1
17/03/15 11:05:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489547007191_0001
17/03/15 11:05:30 INFO impl.YarnClientImpl: Submitted application application_1489547007191_0001
17/03/15 11:05:31 INFO mapreduce.Job: The url to track the job: http://CentOSMaster:8088/proxy/application_1489547007191_0001/
17/03/15 11:05:31 INFO mapreduce.Job: Running job: job_1489547007191_0001
17/03/15 11:05:48 INFO mapreduce.Job: Job job_1489547007191_0001 running in uber mode : false
17/03/15 11:05:48 INFO mapreduce.Job:  map 0% reduce 0%
17/03/15 11:06:06 INFO mapreduce.Job:  map 100% reduce 0%
17/03/15 11:06:07 INFO mapreduce.Job: Job job_1489547007191_0001 completed successfully
17/03/15 11:06:07 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=127058
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=99
		HDFS: Number of bytes written=21
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=13150
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=13150
		Total vcore-milliseconds taken by all map tasks=13150
		Total megabyte-milliseconds taken by all map tasks=13465600
	Map-Reduce Framework
		Map input records=1
		Map output records=1
		Input split bytes=99
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=183
		CPU time spent (ms)=1200
		Physical memory (bytes) snapshot=107761664
		Virtual memory (bytes) snapshot=2069635072
		Total committed heap usage (bytes)=30474240
	File Input Format Counters
		Bytes Read=0
	File Output Format Counters
		Bytes Written=21
17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Transferred 21 bytes in 46.7701 seconds (0.449 bytes/sec)
17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Retrieved 1 records.

  

创建一个user/yonhumig的目录,其中t_rec_top里就是我们的数据,不过没有标头,可以看出只是以m,表示map任务就结束了

wc00是配置文件

"AS	1
"License");	1
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.	1
(the	1
-->	3
2.0	1
<!--	3
</configuration>	1
</description>	1
</property>	15
<?xml	1
<configuration>	1
<description>Amount	1
<description>List	1
<description>Number	1
<description>The	7
<description>Where	1
<description>Whether	1
<description>fair-scheduler	1
<description>the	1
<name>yarn.log-aggregation-enable</name>	1
<name>yarn.nodemanager.aux-services</name>	1
<name>yarn.nodemanager.local-dirs</name>	1
<name>yarn.nodemanager.remote-app-log-dir</name>	1
<name>yarn.nodemanager.resource.cpu-vcores</name>	1
<name>yarn.nodemanager.resource.memory-mb</name>	1
<name>yarn.resourcemanager.address</name>	1
<name>yarn.resourcemanager.admin.address</name>	1
<name>yarn.resourcemanager.hostname</name>	1
<name>yarn.resourcemanager.resource-tracker.address</name>	1
<name>yarn.resourcemanager.scheduler.address</name>	1
<name>yarn.resourcemanager.scheduler.class</name>	1
<name>yarn.resourcemanager.webapp.address</name>	1
<name>yarn.resourcemanager.webapp.https.address</name>	1
<name>yarn.scheduler.fair.allocation.file</name>	1
<property>	15
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>	1
<value>${yarn.resourcemanager.hostname}:8030</value>	1
<value>${yarn.resourcemanager.hostname}:8031</value>	1
<value>${yarn.resourcemanager.hostname}:8032</value>	1
<value>${yarn.resourcemanager.hostname}:8033</value>	1
<value>${yarn.resourcemanager.hostname}:8088</value>	1
<value>${yarn.resourcemanager.hostname}:8090</value>	1
<value>/home/hxsyl/Spark_Relvant/yarn/local</value>	1
<value>/tmp/logs</value>	1
<value>12</value>	1
<value>30720</value>	1
<value>CentOSMaster</value>	1
<value>mapreduce_shuffle</value>	1
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>	1
<value>true</value>	1
ANY	1
An	1
Apache	1
BASIS,	1
CONDITIONS	1
CPU	1
Configs	1
IS"	1
Individual	1
KIND,	1
LICENSE	1
License	3
License,	1
License.	2
Licensed	1
MB,	1
Manager	1
OF	1
OR	1
RM	3
RM.</description>	2
Resource	1
See	2
Site	1
Unless	1
Version	1
WARRANTIES	1
WITHOUT	1
YARN	1
You	1
a	1
a-zA-Z0-9_	1
accompanying	1
adddress	1
address	4
admin	1
aggregate	1
aggregation</description>	1
agreed	1
allocated	2
an	1
and	2
applicable	1
application‘s	1
application.</description>	2
applications	1
as	1
at	1
be	4
by	1
called	1
can	3
class	1
compliance	1
conf	1
configuration	1
contain	1
container_${contid},	1
containers‘	1
containers.</description>	2
copy	1
cores	1
directories	1
directories,	1
directory	1
distributed	2
either	1
enable	1
except	1
express	1
file	2
file.	1
files	1
for	3
found	1
governing	1
hostname	1
http	1
http://www.apache.org/licenses/LICENSE-2.0	1
https	1
implied.	1
in	4
in.	1
in:	1
interface	1
interface.</description>	2
is	1
language	1
law	1
limitations	1
localized	2
location</description>	1
log	1
logs	1
manager	1
may	2
memory,	1
name	1
not	2
numbers</description>	1
obtain	1
of	11
on	1
only	1
or	2
permissions	1
physical	1
properties	1
required	1
resource	1
scheduler	1
scheduler.</description>	1
service	1
should	1
software	1
specific	2
start	1
store	1
subdirectories	1
that	2
the	15
this	1
this.	1
to	5
to.</description>	1
under	3
use	2
valid	1
version="1.0"?>	1
web	2
will	2
with	2
work	1
writing,	1
you	1

  

--target-dir  /path       放到那个路径        -m :标书numberMapper

从hdfs上打开的文件可以看出  默认是逗号       --fields-terminated-by ‘\t‘   这个分隔符不是为了写入到hdfs来分割,而是原始数据的分隔符

--columns ‘id,account,income‘    只导入某些特定的列

符合特定条件的列才被导入,--where "id>2 and id <9"

从多个表查询或者指定查询语句  --query "select * form t_detail where id >5 and $CONDITIONS"      $那个必须加

但是如果-m大于1 就需要指定各个Mapper读取几条记录或者找分隔符 --split-by t_detail.id   $CONDITIONS就是根据分割的信息找到记录条数,进而切分数据,

建议使用单引号 使用双引号需要转义, --后边跟的是全称 -是简写

单引号与双引号的最大不同在于双引号仍然可以保有变量的内容,但单引号内仅能是
一般字符 ,而不会有特殊符号。我们以底下的例子做说明:假设您定义了一个变量, 
name=VBird ,现在想以 name 这个变量的内容定义出 myname 显示 VBird its me 这
个内容,要如何订定呢?

[[email protected] ~]# name=VBird 
[[email protected] ~]# echo $name 
VBird 
[[email protected] ~]# myname="$name its me" 
[[email protected] ~]# echo $myname 
VBird its me 
[[email protected] ~]# myname=‘$name its me‘ 
[[email protected] ~]# echo $myname 
$name its me

发现了吗?没错!使用了单引号的时候,那么 $name 将失去原有的变量内容, 仅为
一般字符的显示型态而已!这里必需要特别小心在意!

时间: 2024-10-18 02:03:12

Sqoop导入到hdfs的相关文章

使用Sqoop将MySql数据导入到HDFS

##以上完成后在h3机器上配置sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz //将宿主机上MySql的test库中的users表的数据导入到HDFS,默认Sqoop会起4个Map运行MapReduce进行导入到HDFS,存放在HDFS路径为/user/root/users(user:默认用户,root:MySql数据库的用户,test:表名)目录下有四个输出文件sqoop import --connect jdbc:mysql://192.168.1.10

sqoop 导入数据到HDFS注意事项

今天碰到不少问题,记录一下. 分割符的方向问题 首先sqoop的参数要小心, 从数据库导出数据,写到HDFS的文件中的时候,字段分割符号和行分割符号必须要用 --fields-terminated-by 而不能是 --input-fields-terminated-by --input前缀的使用于读文件的分割符号,便于解析文件,所以用于从HDFS文件导出到某个数据库的场景. 两个方向不一样. 参数必须用单引号括起来 官方文档的例子是错的: The octal representation of

Sqoop MySQL 导入到HDFS

从MySQL数据库phx中读取tree表到HDFS 执行命令: sqoop import --connect jdbc:mysql://node1:3306/phx \--username root --table tree --m 1 命令行输出: Warning: /csh/link/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCat

sqoop从mysql导入到hdfs

1.mysql -- 创建数据库 create database logs; -- 使用 use logs; -- 创建表 create table weblogs(  md5 varchar(32),  url varchar(64),  request_date date,  request_time time,  ip varchar(15) ); -- 从外部文本文件加载数据 load data infile '/path/weblogs_entries.txt' into table

sqoop操作之Oracle导入到HDFS

导入表的所有字段 sqoop import --connect jdbc:oracle:thin:@192.168.1.100:1521:ORCL \ --username SCOTT --password tiger \ --table EMP -m 1; 查看执行结果: hadoop fs -cat /user/hadoop/EMP/part-m-00000 7369,SMITH,CLERK,7902,1980-12-17 00:00:00.0,800,null,20 7499,ALLEN,

利用SQOOP将数据从数据库导入到HDFS(并行导入,增量导入)

基本使用 如下面这个shell脚本: #Oracle的连接字符串,其中包含了Oracle的地址,SID,和端口号CONNECTURL=jdbc:oracle:thin:@20.135.60.21:1521:DWRAC2#使用的用户名ORACLENAME=kkaa#使用的密码ORACLEPASSWORD=kkaa123#需要从Oracle中导入的表名oralceTableName=tt#需要从Oracle中导入的表中的字段名columns=AREA_ID,TEAM_NAME#将Oracle中的数据

sqoop的使用之import导入到HDFS

原文链接: https://www.toutiao.com/i6772128429614563843/ 首先我们已经安装好sqoop了,如果没有安装好参考文档<快速搭建CDH-Hadoop-Hive-Zoopkeeper-Sqoop环境进入Sqoop学习环境> 准备一些基本的认识 1.sqoop分为了sqoop1和sqoop2 2.sqoop2拆分server和client,类似于hiveserver2和beeline 3.sqoop早期是一些封装MR程序,以jar文件的形式,最后才演变成了框

sqoop导入增量数据

使用sqoop导入增量数据. 核心参数 --check-column 用来指定一些列,这些列在增量导入时用来检查这些数据是否作为增量数据进行导入,和关系行数据库中的自增字段及时间戳类似这些被指定的列的类型不能使用任意字符类型,如char.varchar等类型都是不可以的,同时 --check-column 可以去指定多个列 --incremental 用来指定增量导入的模式,两种模式分别为append 和 lastmodified --last-value 指定上一次导入中检查列指定字段的最大值

sqoop导入数据时间日期类型错误

一个问题困扰了很久,用sqoop import从mysql数据库导入到HDFS中的时候一直报错,最后才发现是一个时间日期类型的非法值导致. hive只支持timestamp类型,而mysql中的日期类型是datetime, 当datetime的值为0000-00-00 00:00:00的时候,sqoop import成功,但是在hive中执行select语句查询该字段的时候报错. 解决方法是在创建hive表时用string字段类型. sqoop导入数据时间日期类型错误,布布扣,bubuko.co