SQOOP Load Data from Oracle to Hive Table

sqoop import -D oraoop.disabled=true
--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))"
--username USERNAME --table TABLE_NAME --null-string ‘\\N‘ --null-non-string ‘\\N‘
--hive-import --hive-table HIVEDB.HIVETALBENAME
--num-mappers 1 --verbose --password PWD --hive-drop-import-delims --hive-overwrite

--fetch-size 500

-D is not the parameter for sqoop, it is used for hadoop.

oraoop.disabled=true

If not set this parameter, the command report a issue: table or view does not exists.

Oraoop is a special plugin for sqoop that provides faster access to Oracle‘s RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.

In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can‘t be recognized a valid thin url. Maybe the driver issue .

Another thing need to take care is , you ‘d better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.

--hive-drop-import-delims

This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.

It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.

But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.

--hive-overwrite

This will overwrite the data in the hive table

--fetch-size

This parameter ‘s default value is 1000.

One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .

The java file not generated now. I don‘t know why, but this error occurs before the fetch size setting, so I change this.

The root cause may need get more information from source code .

--null-string ‘\\N‘ --null-non-string ‘\\N‘

For this parameter, the hive will parse NULL in RDBMS to string ‘null‘, with this parameter, it will keep null in hive table.

If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.

First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.

If the command execute successfully, it will clean the staging file.

If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.

If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.

If you use --query (-e) , use free query to load data.

Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.

And you should add parameter --target-dir /hdfspath , if you use --query.

when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.

so you need do something for this. please take care.

时间: 2024-10-07 00:33:53

SQOOP Load Data from Oracle to Hive Table的相关文章

Loading Data From Oracle To Hive By ODI 12c

本文描述如何通过ODI将Oracle表数据同步到Hive.1.准备工作在hadoop集群的各个节点分别安装Oracle Big Data Connectors,具体的组件如下图所示:这里只需安装Oracle Loader For Hadoop(oraloader)以及Oracle SQL Connector for Hadoop Distributed File System (oraosch)两个软件.安装比较简单,直接解压即可使用(这里的ODI.oraosch以及oraloader组件都是以

使用sqoop从mysql导入数据到hive

前言 这篇文章主要是用sqoop从mysql导入数据到hive时遇到的坑的总结. 环境: 系统:Centos 6.5 Hadoop:Apache,2.7.3 Mysql:5.1.73 JDK:1.8 Sqoop:1.4.7 Hadoop以伪分布式模式运行. 一.使用的导入命令 我主要是参考一篇文章去测试的,Sqoop: Import Data From MySQL to Hive. 参照里面的方法,在mysql建了表,填充了数据,然后按照自己的情况输入了命令: sqoop import --co

mysql导入数据load data infile用法整理

有时候我们需要将大量数据批量写入数据库,直接使用程序语言和Sql写入往往很耗时间,其中有一种方案就是使用MySql Load data infile导入文件的形式导入数据,这样可大大缩短数据导入时间. 假如是从MySql客户端调用,将客户端的文件导入,则需要使用 load local data infile. LOAD DATA INFILE 语句以很高的速度从一个文本文件中读取行到一个表中.文件名必须是一个文字字符串. 1,开启load local data infile. 假如是Linux下

Mysql load data 命令解析、处理 error 29 (ErrCode: 13) 错误(在ubuntu环境下)

在 mysql 服务器上,可以通过 load data infile 'file_name' into table table_name; 命令将一个文本文件中的所有数据存到指定表中.最粗略形式的例子: load data infile 'test.txt' into table test_table; 默认情况下,load data infile 对于文本中行为是: 一行对应数据库表中的一条记录 各个字段之间以tab键分开 每个字段的值没有被任何字符括起来 行没有前缀可以忽略 比如某一行文本:

mysql 的load data infile的用法

LOAD DATA INFILE语句从一个文本文件中以很高的速度读入一个表中. 1.基本语法 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE tbl_name [FIELDS [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char' ] ] [LINE

My SQL load data infile 遇到的问题总结

假如: create table test_table( id int, name varchar (128); 假如命令如下: load data infile 'a.txt' into table `test_table` fields enclosed by '"' terminated by ',' 假如列值如下: "1","abc"123"abc" 这样的值还是能正确处理的.导入之后,列值为:1 abc"123&qu

mysql的load data,高速将文本文件,插入数据库中

1语法 LOAD DATA [ LOW_PRIORITY | CONCURRENT ] [ LOCAL ] INFILE 'file_name.txt' [ REPLACE | IGNORE ] INTO TABLE tbl_name [ FIELDS [ TERMINATED BY 'string' ] [ [OPTIONALLY] ENCLOSED BY 'char' ] [ ESCAPED BY 'char'  ] ] [ LINES [ STARTING BY 'string' ] [

mysql load data infile的使用 和 SELECT into outfile备份数据库数据

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE tbl_name [FIELDS [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char' ] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string

load data妙用

(NO.1)LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo(@var1, @var2)SET column1 = REPLACE(@var1, ',', '.'), column2 = REPLACE(@var2, ',', '.') (no.2) LOAD DATA infile '/usr/local/src/cre_tab/report02.csv' REPLACE INTO TABLE `table_metadata_info` ch