Oracle Bigdata Connector实战2: 使用Oracle Loader for Hadoop加载Hive表到Oracle数据库

  • 部署Hadoop/Hive/OraLoader软件
  1. [[email protected] ~]$ tree -L 1
  2. ├── hadoop-2.6.2
  3. ├── hbase-1.1.2
  4. ├── hive-1.1.1
  5. ├── jdk1.8.0_65
  6. ├── oraloader-3.4.0
  • 配置hive metastore

我们采用MySQL作为hive的metastore,创建MySQL数据库

  1. mysql> create database metastore DEFAULT CHARACTER SET latin1;
  2. Query OK, 1 row affected (0.00 sec)
  3.  
  4. mysql> grant all on metastore.* TO ‘hive‘@‘server1‘ IDENTIFIED BY ‘123456‘;
  5. Query OK, 0 rows affected (0.00 sec)
  6.  
  7. mysql> flush privileges;
  8. Query OK, 0 rows affected (0.00 sec)
  • 配置hive-site.xml
  1. <property>
  2.    <name>javax.jdo.option.ConnectionURL</name>
  3.    <value>jdbc:mysql://server1:3306/metastore?createDatabaseIfNotExist=true</value>
  4. </property>
  5. <property>
  6.    <name>javax.jdo.option.ConnectionDriverName</name>
  7.    <value>com.mysql.jdbc.Driver</value>
  8. </property>
  9. <property>
  10.    <name>javax.jdo.option.ConnectionUserName</name>
  11.    <value>hive</value>
  12. </property>
  13. <property>
  14.    <name>javax.jdo.option.ConnectionPassword</name>
  15.    <value>123456</value>
  16. </property>
  17. <property>
  18.    <name>mapreduce.framework.name</name>
  19.    <value>yarn</value>
  20. </property>
  • 配置Oracle Loader的配置文件
  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <configuration>
  3.  
  4. <!-- Input settings -->
  5.  <property>
  6.    <name>mapreduce.inputformat.class</name>
  7.    <value>oracle.hadoop.loader.lib.input.HiveToAvroInputFormat</value>
  8.  </property>
  9.   <property>
  10.    <name>oracle.hadoop.loader.input.hive.databaseName</name>
  11.    <value>default</value>
  12.  </property>
  13.  <property>
  14.    <name>oracle.hadoop.loader.input.hive.tableName</name>
  15.    <value>catalog</value>
  16.  </property>
  17.  <property>
  18.    <name>mapred.input.dir</name>
  19.    <value>/user/hive/warehouse/catalog</value>
  20.  </property>
  21.   <property>
  22.    <name>oracle.hadoop.loader.input.fieldTerminator</name>
  23.    <value>\u002C</value>
  24.  </property>
  25.  
  26. <!-- Output settings -->
  27.  <property>
  28.    <name>mapreduce.job.outputformat.class</name>
  29.    <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
  30.  </property>
  31.  <property>
  32.    <name>mapreduce.output.fileoutputformat.outputdir</name>
  33.    <value>oraloadout</value>
  34.  </property>
  35.  
  36. <!-- Table information -->
  37.   <property>
  38.    <name>oracle.hadoop.loader.loaderMap.targetTable</name>
  39.    <value>catalog</value>
  40.  </property>
  41.  <property>
  42.     <name>oracle.hadoop.loader.input.fieldNames</name>
  43.     <value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
  44. </property>
  45.  
  46. <!-- Connection information -->
  47.  <property>
  48.   <name>oracle.hadoop.loader.connection.url</name>
  49.   <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}:${SID}</value>
  50. </property>
  51.  <property>
  52.   <name>TCPPORT</name>
  53.   <value>1521</value>
  54. </property>
  55. <property>
  56.   <name>HOST</name>
  57.   <value>server1</value>
  58. </property>
  59. <property>
  60.  <name>SID</name>
  61.   <value>orcl</value>
  62.   </property>
  63. <property>
  64.   <name>oracle.hadoop.loader.connection.user</name>
  65.   <value>baron</value>
  66. </property>
  67. <property>
  68.   <name>oracle.hadoop.loader.connection.password</name>
  69.   <value>baron</value>
  70. </property>
  71. </configuration>
  • 创建Hive表
  1. CREATEEXTERNALTABLEcatalog(CATALOGID INT,JOURNAL STRING, PUBLISHER STRING,
  2.   EDITION STRING,TITLE STRING,AUTHOR STRING) ROW FORMAT DELIMITED FIELDS
  3.   TERMINATED BY ‘,‘ LINES TERMINATED BY ‘\n‘ STORED
  4. AS
  5.   TEXTFILE LOCATION ‘/catalog‘;
  • 使用Oracle Loader for Hadoop加载Hive表到Oracle数据库

这里需要注意两点:

1 必须添加Hive配置文件路径到HADOOP_CLASSPATH环境变量

2 必须命令行调用hive-exec-*.jar hive-metastore-*.jar libfb303*.jar

  1. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OLH_HOME/jlib/*:$HIVE_HOME/lib/*:$HIVE_CONF_DIR
  2. hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadJobConf-hive.xml -libjars $OLH_HOME/jlib/oraloader.jar,$HIVE_HOME/lib/hive-exec-1.1.1.jar,$HIVE_HOME/lib/hive-metastore-1.1.1.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar

输出结果如下:

Oracle Loader for Hadoop Release 3.4.0 - Production

Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hive-1.1.1/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

15/12/08 04:53:51 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.4.0 - Production

Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.

15/12/08 04:53:51 INFO loader.OraLoader: Built-Against: hadoop-2.2.0 hive-0.13.0 avro-1.7.3 jackson-1.8.8

15/12/08 04:53:51 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class

15/12/08 04:53:51 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

15/12/08 04:54:23 INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication

15/12/08 04:54:24 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: CATALOG is not partitioned

15/12/08 04:54:24 INFO output.DBOutputFormat: Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat

15/12/08 04:54:24 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

15/12/08 04:54:26 WARN loader.OraLoader: Sampler is disabled because the number of reduce tasks is less than two. Job will continue without sampled information.

15/12/08 04:54:26 INFO loader.OraLoader: Submitting OraLoader job OraLoader

15/12/08 04:54:26 INFO client.RMProxy: Connecting to ResourceManager at server1/192.168.56.101:8032

15/12/08 04:54:28 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore

15/12/08 04:54:28 INFO metastore.ObjectStore: ObjectStore, initialize called

15/12/08 04:54:29 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored

15/12/08 04:54:29 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored

15/12/08 04:54:31 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"

15/12/08 04:54:33 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:33 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:34 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:34 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

15/12/08 04:54:34 INFO DataNucleus.Query: Reading in results for query "[email protected]" since the connection used is closing

15/12/08 04:54:34 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL

15/12/08 04:54:34 INFO metastore.ObjectStore: Initialized ObjectStore

15/12/08 04:54:34 INFO metastore.HiveMetaStore: Added admin role in metastore

15/12/08 04:54:34 INFO metastore.HiveMetaStore: Added public role in metastore

15/12/08 04:54:35 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty

15/12/08 04:54:35 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=catalog

15/12/08 04:54:35 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=catalog

15/12/08 04:54:36 INFO mapred.FileInputFormat: Total input paths to process : 1

15/12/08 04:54:36 INFO metastore.HiveMetaStore: 0: Shutting down the object store...

15/12/08 04:54:36 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Shutting down the object store...

15/12/08 04:54:36 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.

15/12/08 04:54:36 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=Metastore shutdown complete.

15/12/08 04:54:37 INFO mapreduce.JobSubmitter: number of splits:2

15/12/08 04:54:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449544601730_0015

15/12/08 04:54:38 INFO impl.YarnClientImpl: Submitted application application_1449544601730_0015

15/12/08 04:54:38 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1449544601730_0015/

15/12/08 04:54:49 INFO loader.OraLoader: map 0% reduce 0%

15/12/08 04:55:07 INFO loader.OraLoader: map 100% reduce 0%

15/12/08 04:55:22 INFO loader.OraLoader: map 100% reduce 67%

15/12/08 04:55:47 INFO loader.OraLoader: map 100% reduce 100%

15/12/08 04:55:47 INFO loader.OraLoader: Job complete: OraLoader (job_1449544601730_0015)

15/12/08 04:55:47 INFO loader.OraLoader: Counters: 49

File System Counters

FILE: Number of bytes read=395

FILE: Number of bytes written=370110

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=6005

HDFS: Number of bytes written=1861

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=5

Job Counters

Launched map tasks=2

Launched reduce tasks=1

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=29809

Total time spent by all reduces in occupied slots (ms)=36328

Total time spent by all map tasks (ms)=29809

Total time spent by all reduce tasks (ms)=36328

Total vcore-seconds taken by all map tasks=29809

Total vcore-seconds taken by all reduce tasks=36328

Total megabyte-seconds taken by all map tasks=30524416

Total megabyte-seconds taken by all reduce tasks=37199872

Map-Reduce Framework

Map input records=3

Map output records=3

Map output bytes=383

Map output materialized bytes=401

Input split bytes=5610

Combine input records=0

Combine output records=0

Reduce input groups=1

Reduce shuffle bytes=401

Reduce input records=3

Reduce output records=3

Spilled Records=6

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=1245

CPU time spent (ms)=14220

Physical memory (bytes) snapshot=757501952

Virtual memory (bytes) snapshot=6360301568

Total committed heap usage (bytes)=535298048

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1620

  • 问题

Oracle Loader for Hadoop的问题是出现未加载到数据库成功的行并没有报错提示

时间: 2024-08-26 05:44:58

Oracle Bigdata Connector实战2: 使用Oracle Loader for Hadoop加载Hive表到Oracle数据库的相关文章

Oracle Bigdata Connector实战1: 使用Oracle Loader加载HDFS文件到Oracle数据库

部署jdk/Hadoop/OraLoader软件包 将准备好的软件包,逐一解压到hadoop用户home目录下: hadoop-2.6.2.tar.gz jdk-8u65-linux-x64.gz oraloader-3.4.0.x86_64.zip Hadoop软件部署如下: ├── hadoop-2.6.2 ├── jdk1.8.0_65 ├── oraloader-3.4.0-h2 设置环境变量 export JAVA_HOME=/home/hadoop/jdk1.8.0_65 expor

Centos系统创建用户oracle后,用该用户登陆系统,页面加载报错GConf error

Linux 的 GConf error 解决办法 问题: Centos系统创建用户oracle后,用该用户登陆系统,页面加载报错,导致重新进入Centos系统后出现: GConf error:Failed to contact configuration server;some possible cause are that you need to enable TCP/IP networking for ORBIT or you have stale NFSlocks due to a sys

CI框架 -- 核心文件 之 Loader.php(加载器)

顾名思义,装载器就是加载元素的,使用CI时,经常加载的有: 加载类库文件:$this->load->library() 加载视图文件:$this->load->view() 加载模型文件:$this->load->model() 加载数据库文件:$this->load->database() 加载帮助文件:$this->load->helper() 加载配置文件:$this->load->config() 加载包路径:$this-&g

GRUB(GRand Unified Boot loader)引导加载程序

http://hi.baidu.com/eao110/blog/item/b56177ec8c89afdc2f2e218f.html 一.GRUB简介 首先搞清楚与 GNU GRUB的关系. GNU GRUB 分为 GNU GRUB Legacy 和 GNU GRUB2 两代 .GNU GRUB Legacy 其实就是原来的 GNU GRUB 0.xx ,最新版是 2005 年发布的 GNU GRUB 0.97 .目前已停止开发,并改名为 GNU GRUB Lagecy .GNU GRUB2 是

小白学phoneGap《构建跨平台APP:phoneGap移动应用实战》连载四(使用程序加载事件)

在了解了PhoneGap中都有哪些事件之后,本节将开始对这些事件的用法进行详细地介绍.本节要介绍的是程序加载事件,也就是deviceready.pause和resume这3个事件. [范例4-2 程序加载事件的使用] 01 <!DOCTYPE html> 02 <html> 03 <head> 04 <meta charset="utf-8"> 05 <title>程序加载事件的使用</title> 06 <

利用Loader来动态加载不同的QML文件来改变UI

在这篇文章中,我们将介绍如何使用Loader来加载不同的QML文件来实现动态的UI.在之前的文章"如何使用Loader来动态载入一个基于item的Component"中,我们已经介绍了一些关于它的用法.Loader的好处是只有在我们需要的时候才装载我们所需要的QML文件,这样可以节省应用所需要的内存,也同时可以提高应用的启动时间(如果利用好的话).下面我们以一个简单的例子来做一个介绍.更多关于动态生产QML UI的例子,请参阅"如何使用QML动态产生Component来完成我

Module loader:模块加载器

<p data-height="265" data-theme-id="0" data-slug-hash="XpqRmq" data-default-tab="result" data-user="286810" data-embed-version="2" data-pen-title="Module loader" class="codepen&

爬虫再探实战(三)———爬取动态加载页面——selenium

自学python爬虫也快半年了,在目前看来,我面临着三个待解决的爬虫技术方面的问题:动态加载,多线程并发抓取,模拟登陆.目前正在不断学习相关知识.下面简单写一下用selenium处理动态加载页面相关的知识.目标——抓取页面所有的高考录取分数信息. 对于动态加载,开始的时候是看到Selenium+Phantomjs的强大,直接就学的这个.打开网页查看网页源码(注意不是检查元素)会发现要爬取的信息并不在源码里面.也就是说,从网页源码无法通过解析得到数据.Selenium+Phantomjs的强大一方

sql*loader的直接加载方式和传统加载方式的性能差异

1.确认数据库版本 2.数据准备 3.创建导入表及控制文件 4.直接加载方式演示 查看具体的日志: 5.传统加载方式演示 查看日志文件: 6.结论及两种方式的差异 经过比对direct比conventional要提高了10倍效率. 对比这两种加载方式的区别: Direct 特点 ü  数据绕过SGA直接写入磁盘的数据文件. ü  数据直接写入高水位线HWM之后的新块,不会扫描HWM之前的空闲块. ü  commit之后移动HWM他人才能看到. ü  不对已用空间进行扫描. ü  使用direct