连接Oracle与Hadoop(4) Oracle使用OSCH访问Hive表

OSCH是Oracle SQL Connector for Hadoop的缩写,Oracle出品的大数据连接器的一个组件

本文介绍的就是如何使用OSCH从Oracle数据库直接访问Hive表

  • 前提1:在Oracle数据库端,部署好HDFS客户端与OSCH软件,设置好环境变量
  1. #JAVA
  2. export JAVA_HOME=/home/oracle/jdk1.8.0_65
  3.  
  4. #Hadoop
  5. export HADOOP_USER_NAME=hadoop
  6. export HADOOP_HOME=/home/oracle/hadoop-2.6.2
  7. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  8. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  9. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  10. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  11. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  12. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  13. export HADOOP_YARN_HOME=${HADOOP_HOME}
  14. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  15. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  16. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
  17.  
  18. #OSCH_HOME
  19. export OSCH_HOME=/home/oracle/orahdfs-3.3.0
  20.  
  21. #PATH
  22. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:$OSCH_HOME/jlib/*
  23. export PATH=$ORACLE_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
  • 前提2:在Hadoop集群,部署OSCH软件,设置好环境变量
  1. export JAVA_HOME=/home/hadoop/jdk1.8.0_65
  2.  
  3. export HADOOP_USER_NAME=hadoop
  4. export HADOOP_YARN_USER=hadoop
  5. export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
  6. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  7. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  8. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  9. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  10. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  11. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  12. export HADOOP_YARN_HOME=${HADOOP_HOME}
  13. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  14. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  15.  
  16. export HIVE_HOME=/home/hadoop/hive-1.1.1
  17. export HIVE_CONF_DIR=${HIVE_HOME}/conf
  18.  
  19. export OSCH_HOME=/home/hadoop/orahdfs-3.3.0
  20.  
  21. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:/usr/share/java/mysql-connector-java.jar
  22. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
  23. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OSCH_HOME/jlib/*
  24.  
  25. export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH
  • 在Oracle数据库端创建必要的目录
  1. mkdir -p /home/oracle/osch_output
  2.  
  3. CONNECT / AS sysdba;
  4. drop DIRECTORY osch_bin_path;
  5. CREATE OR REPLACE DIRECTORY osch_bin_path AS ‘/home/oracle/orahdfs-3.3.0/bin‘;
  6. GRANT READ, EXECUTE ON DIRECTORY OSCH_BIN_PATH TO baron;
  7.  
  8. drop DIRECTORY osch_hive_dir;
  9. CREATE OR REPLACE DIRECTORY osch_hive_dir AS ‘/home/oracle/osch_output‘;
  10. GRANT READ, WRITE ON DIRECTORY osch_hive_dir TO baron;
  • 在Hive中,创建测试需要的Hive表
  1. CREATE TABLE catalog
  2.   (
  3.     catalogid INTEGER PRIMARY KEY,
  4.     journal VARCHAR2(25),
  5.     publisher VARCHAR2(25),
  6.     edition VARCHAR2(25),
  7.     title VARCHAR2(45),
  8.     author VARCHAR2(25)
  9.   );
  10.  
  11.  
  12. #
  13. echo ‘1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
  14. 2,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,From ADF UIX to JSF,Jonas Jacobi
  15. 3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench‘ > catalog.txt
  16.  
  17. Hive> load data local inpath ‘/home/hadoop/catalog.txt‘ into table catalog;
  • 在Hadoop集群,运行OSCH包直接创建外部表
  1. hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
  2. oracle.hadoop.exttab.ExternalTable \
  3. -D oracle.hadoop.exttab.tableName=orders_ext \
  4. -D oracle.hadoop.exttab.sourceType=hive \
  5. -D oracle.hadoop.exttab.locationFileCount=2 \
  6. -D oracle.hadoop.exttab.hive.tableName=orders_raw \
  7. -D oracle.hadoop.exttab.hive.databaseName=default \
  8. -D oracle.hadoop.exttab.defaultDirectory=osch_hive_dir \
  9. -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//server1:1521/orcl \
  10. -D oracle.hadoop.connection.user=baron \
  11. -D oracle.hadoop.exttab.printStackTrace=true \
  12. -createTable

输出结果如下:

  1. Oracle SQL Connector for HDFS Release 3.3.0 - Production
  2.  
  3. Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
  4.  
  5. [Enter Database Password:]
  6. 15/12/15 04:45:30 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
  7. 15/12/15 04:45:31 INFO metastore.ObjectStore: ObjectStore, initialize called
  8. 15/12/15 04:45:31 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
  9. 15/12/15 04:45:31 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
  10. SLF4J: Class path contains multiple SLF4J bindings.
  11. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  12. SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  13. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  14. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  15. 15/12/15 04:45:33 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
  16. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  17. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  18. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  19. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  20. 15/12/15 04:45:37 INFO DataNucleus.Query: Reading in results for query "[email protected]" since the connection used is closing
  21. 15/12/15 04:45:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
  22. 15/12/15 04:45:37 INFO metastore.ObjectStore: Initialized ObjectStore
  23. 15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added admin role in metastore
  24. 15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added public role in metastore
  25. 15/12/15 04:45:38 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
  26. 15/12/15 04:45:39 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=catalog
  27. 15/12/15 04:45:39 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=catalog
  28. The create table command succeeded.
  29.  
  30. User: "BARON" performed the following actions in schema: BARON
  31.  
  32. CREATE TABLE "BARON"."CATALOG_EXT"
  33. (
  34.  "CATALOGID" INTEGER,
  35.  "JOURNAL" VARCHAR2(4000),
  36.  "PUBLISHER" VARCHAR2(4000),
  37.  "EDITION" VARCHAR2(4000),
  38.  "TITLE" VARCHAR2(4000),
  39.  "AUTHOR" VARCHAR2(4000)
  40. )
  41. ORGANIZATION EXTERNAL
  42. (
  43.    TYPE ORACLE_LOADER
  44.    DEFAULT DIRECTORY "CATALOG_HIVE_DIR"
  45.    ACCESS PARAMETERS
  46.    (
  47.      RECORDS DELIMITED BY 0X‘0A‘
  48.      CHARACTERSET AL32UTF8
  49.      PREPROCESSOR "OSCH_BIN_PATH":‘hdfs_stream‘
  50.      FIELDS TERMINATED BY 0X‘2C‘
  51.      MISSING FIELD VALUES ARE NULL
  52.      (
  53.        "CATALOGID" CHAR NULLIF "CATALOGID"=0X‘5C4E‘,
  54.        "JOURNAL" CHAR(4000) NULLIF "JOURNAL"=0X‘5C4E‘,
  55.        "PUBLISHER" CHAR(4000) NULLIF "PUBLISHER"=0X‘5C4E‘,
  56.        "EDITION" CHAR(4000) NULLIF "EDITION"=0X‘5C4E‘,
  57.        "TITLE" CHAR(4000) NULLIF "TITLE"=0X‘5C4E‘,
  58.        "AUTHOR" CHAR(4000) NULLIF "AUTHOR"=0X‘5C4E‘
  59.      )
  60.    )
  61.    LOCATION
  62.    (
  63.      ‘osch-20151215044541-2290-1‘
  64.    )
  65. ) PARALLEL REJECT LIMIT UNLIMITED;
  66.  
  67. The following location files were created.
  68.  
  69. osch-20151215044541-2290-1 contains 1 URI, 263 bytes
  70.  
  71.          263 hdfs://server1:8020/user/hive/warehouse/catalog/catalog.txt

运行完毕后Oracle中直接外部表就已创建完毕

  • 在Oracle中执行查询Hive外部表
  1. SQL> select count(*) from catalog_ext;
  2.  
  3.   COUNT(*)
  4. ----------
  5.          3
时间: 2024-11-04 11:29:28

连接Oracle与Hadoop(4) Oracle使用OSCH访问Hive表的相关文章

基于OGG的Oracle与Hadoop集群准实时同步介绍

Oracle里存储的结构化数据导出到Hadoop体系做离线计算是一种常见数据处置手段.近期有场景需要做Oracle到Hadoop体系的实时导入,这里以此案例做以介绍.Oracle作为商业化的数据库解决方案,自发性的获取数据库事务日志等比较困难,故选择官方提供的同步工具OGG(Oracle GoldenGate)来解决. 安装与基本配置 环境说明 软件配置 角色 数据存储服务及版本 OGG版本 IP 源服务器 OracleRelease11.2.0.1 Oracle GoldenGate 11.2

ERROR: ORA-28547: 连接服务器失败, 可能是 Oracle Net 管理错误

问题:当使用远程连接时出现 ERROR: ORA-28547: 连接服务器失败, 可能是 Oracle Net 管理错误 解决: 1.查看监听状态,发现不是我定义的orcl 2.使用Net Configuration Assistant从新配置listener.ora 3.修改listener.ora 4.重启服务 5.验证

Oracle Bigdata Connector实战2: 使用Oracle Loader for Hadoop加载Hive表到Oracle数据库

部署Hadoop/Hive/OraLoader软件 [[email protected] ~]$ tree -L 1 ├── hadoop-2.6.2 ├── hbase-1.1.2 ├── hive-1.1.1 ├── jdk1.8.0_65 ├── oraloader-3.4.0 配置hive metastore 我们采用MySQL作为hive的metastore,创建MySQL数据库 mysql> create database metastore DEFAULT CHARACTER SE

Plsq连接远程(虚拟机)Oracle数据库帮助文档

工组中无意接触到虚拟机,很好奇就安装一个玩玩.后来听说可以把数据库装到虚拟机上,回家就忙起来,搞了好久也没有成功,后来朋友小黎给我远程帮助了一下,还是没有成功.就差那么一丢丢.不过他的帮助是我明白Plsq访问远程数据库的原理.过了几天自己在那搞,通过网上查资料,唉好了!哎呀那个高兴啊,最后决定写一篇帮助文档分享!好开始 第一步:安装软件,虚拟机,ORACEL数据库 ,plsq软件:下载Instant Client,点击下载,下载第一个basic就行了,下载完解压缩,以放到主win7 c盘根目录为

[Oracle系列整理04] oracle pl/sql 基础

PL/SQL块中只能直接嵌入SELECT,DML(INSERT,UPDATE,DELETE)以及事务控制语句 (COMMIT,ROLLBACK,SAVEPOINT),而不能直接嵌入DDL语句(CREATE,ALTER,DROP)和DCL语句 (GRANT,REVOKE) 1.检索单行数据    1.1使用标量变量接受数据  v_ename emp.ename%type;  v_sal   emp.sal%type;  select ename,sal into v_ename,v_sal fro

ORACLE LINUX 6.3 + ORACLE 11.2.0.3 RAC + VBOX安装文档

ORACLE LINUX 6.3 + ORACLE 11.2.0.3 RAC + VBOX安装文档 2015-10-21 12:51 525人阅读 评论(0) 收藏 举报  分类: Oracle RAC 版权声明:本文为博主原创文章,未经博主允许不得转载. VBOX + Oracle Linux 6U3 + ORACLE 11.2.0.3.0 RAC 安装 环境: 虚拟机          VIRTUAL BOX OS                ORACLE LINUX 6.3_X86_64

oracle连接数据库报错:ORA-01034: ORACLE not available(Oracle 不存在),ORA-27101: shared memory realm does not exist

花一天半的时间解决客户端连接服务端的oracle数据库,无法连接问题.ORA-01034: ORACLE not available(Oracle 不存在),ORA-27101: shared memory realm does not   exist 分析:前几天还可以连接数据库,但是昨天开始,本地无缘无故的不能连接服务端数据库.网上很多人说造成这个问题的原因是异常关机 ,数据库没有在关机之前关闭.我公司人说在前几天确实有人重启过电脑,就当是这个原因吧,有时间一定要把真正原因找出来. 下面主要

【Oracle 集群】ORACLE DATABASE 11G RAC 知识图文详细教程之RAC 特殊问题和实战经验(五)

RAC 特殊问题和实战经验(五) 概述:写下本文档的初衷和动力,来源于上篇的<oracle基本操作手册>.oracle基本操作手册是作者研一假期对oracle基础知识学习的汇总.然后形成体系的总结,一则进行回顾复习,另则便于查询使用.本图文文档亦源于此.阅读Oracle RAC安装与使用教程前,笔者先对这篇文章整体构思和形成进行梳理.由于阅读者知识储备层次不同,我将从Oracle RAC安装前的准备与规划开始进行整体介绍安装部署Oracle RAC.始于唐博士指导,对数据库集群进行配置安装,前

Oracle新建用户、角色,授权,建表空间

oracle数据库的权限系统分为系统权限与对象权限.系统权限( database system privilege )可以让用户执行特定的命令集.例如,create table权限允许用户创建表,grant any privilege 权限允许用户授予任何系统权限.对象权限( database object privilege )可以让用户能够对各个对象进行某些操作.例如delete权限允许用户删除表或视图的行,select权限允许用户通过select从表.视图.序列(sequences)或快照