OSCH是Oracle SQL Connector for Hadoop的缩写,Oracle出品的大数据连接器的一个组件
本文介绍的就是如何使用OSCH从Oracle数据库直接访问Hive表
- 前提1:在Oracle数据库端,部署好HDFS客户端与OSCH软件,设置好环境变量
- #JAVA
- export JAVA_HOME=/home/oracle/jdk1.8.0_65
- #Hadoop
- export HADOOP_USER_NAME=hadoop
- export HADOOP_HOME=/home/oracle/hadoop-2.6.2
- export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
- export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
- export HADOOP_COMMON_HOME=${HADOOP_HOME}
- export HADOOP_HDFS_HOME=${HADOOP_HOME}
- export HADOOP_MAPRED_HOME=${HADOOP_HOME}
- export HADOOP_YARN_HOME=${HADOOP_HOME}
- export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
- #OSCH_HOME
- export OSCH_HOME=/home/oracle/orahdfs-3.3.0
- #PATH
- export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:$OSCH_HOME/jlib/*
- export PATH=$ORACLE_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
- 前提2:在Hadoop集群,部署OSCH软件,设置好环境变量
- export JAVA_HOME=/home/hadoop/jdk1.8.0_65
- export HADOOP_USER_NAME=hadoop
- export HADOOP_YARN_USER=hadoop
- export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
- export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
- export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
- export HADOOP_COMMON_HOME=${HADOOP_HOME}
- export HADOOP_HDFS_HOME=${HADOOP_HOME}
- export HADOOP_MAPRED_HOME=${HADOOP_HOME}
- export HADOOP_YARN_HOME=${HADOOP_HOME}
- export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- export HIVE_HOME=/home/hadoop/hive-1.1.1
- export HIVE_CONF_DIR=${HIVE_HOME}/conf
- export OSCH_HOME=/home/hadoop/orahdfs-3.3.0
- export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:/usr/share/java/mysql-connector-java.jar
- export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
- export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OSCH_HOME/jlib/*
- export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH
- 在Oracle数据库端创建必要的目录
- mkdir -p /home/oracle/osch_output
- CONNECT / AS sysdba;
- drop DIRECTORY osch_bin_path;
- CREATE OR REPLACE DIRECTORY osch_bin_path AS ‘/home/oracle/orahdfs-3.3.0/bin‘;
- GRANT READ, EXECUTE ON DIRECTORY OSCH_BIN_PATH TO baron;
- drop DIRECTORY osch_hive_dir;
- CREATE OR REPLACE DIRECTORY osch_hive_dir AS ‘/home/oracle/osch_output‘;
- GRANT READ, WRITE ON DIRECTORY osch_hive_dir TO baron;
- 在Hive中,创建测试需要的Hive表
- CREATE TABLE catalog
- (
- catalogid INTEGER PRIMARY KEY,
- journal VARCHAR2(25),
- publisher VARCHAR2(25),
- edition VARCHAR2(25),
- title VARCHAR2(45),
- author VARCHAR2(25)
- );
- #
- echo ‘1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
- 2,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,From ADF UIX to JSF,Jonas Jacobi
- 3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench‘ > catalog.txt
- Hive> load data local inpath ‘/home/hadoop/catalog.txt‘ into table catalog;
- 在Hadoop集群,运行OSCH包直接创建外部表
- hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
- oracle.hadoop.exttab.ExternalTable \
- -D oracle.hadoop.exttab.tableName=orders_ext \
- -D oracle.hadoop.exttab.sourceType=hive \
- -D oracle.hadoop.exttab.locationFileCount=2 \
- -D oracle.hadoop.exttab.hive.tableName=orders_raw \
- -D oracle.hadoop.exttab.hive.databaseName=default \
- -D oracle.hadoop.exttab.defaultDirectory=osch_hive_dir \
- -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//server1:1521/orcl \
- -D oracle.hadoop.connection.user=baron \
- -D oracle.hadoop.exttab.printStackTrace=true \
- -createTable
输出结果如下:
- Oracle SQL Connector for HDFS Release 3.3.0 - Production
- Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
- [Enter Database Password:]
- 15/12/15 04:45:30 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
- 15/12/15 04:45:31 INFO metastore.ObjectStore: ObjectStore, initialize called
- 15/12/15 04:45:31 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
- 15/12/15 04:45:31 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
- SLF4J: Class path contains multiple SLF4J bindings.
- SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
- SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
- 15/12/15 04:45:33 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
- 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
- 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
- 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
- 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
- 15/12/15 04:45:37 INFO DataNucleus.Query: Reading in results for query "[email protected]" since the connection used is closing
- 15/12/15 04:45:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
- 15/12/15 04:45:37 INFO metastore.ObjectStore: Initialized ObjectStore
- 15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added admin role in metastore
- 15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added public role in metastore
- 15/12/15 04:45:38 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
- 15/12/15 04:45:39 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=catalog
- 15/12/15 04:45:39 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=catalog
- The create table command succeeded.
- User: "BARON" performed the following actions in schema: BARON
- CREATE TABLE "BARON"."CATALOG_EXT"
- (
- "CATALOGID" INTEGER,
- "JOURNAL" VARCHAR2(4000),
- "PUBLISHER" VARCHAR2(4000),
- "EDITION" VARCHAR2(4000),
- "TITLE" VARCHAR2(4000),
- "AUTHOR" VARCHAR2(4000)
- )
- ORGANIZATION EXTERNAL
- (
- TYPE ORACLE_LOADER
- DEFAULT DIRECTORY "CATALOG_HIVE_DIR"
- ACCESS PARAMETERS
- (
- RECORDS DELIMITED BY 0X‘0A‘
- CHARACTERSET AL32UTF8
- PREPROCESSOR "OSCH_BIN_PATH":‘hdfs_stream‘
- FIELDS TERMINATED BY 0X‘2C‘
- MISSING FIELD VALUES ARE NULL
- (
- "CATALOGID" CHAR NULLIF "CATALOGID"=0X‘5C4E‘,
- "JOURNAL" CHAR(4000) NULLIF "JOURNAL"=0X‘5C4E‘,
- "PUBLISHER" CHAR(4000) NULLIF "PUBLISHER"=0X‘5C4E‘,
- "EDITION" CHAR(4000) NULLIF "EDITION"=0X‘5C4E‘,
- "TITLE" CHAR(4000) NULLIF "TITLE"=0X‘5C4E‘,
- "AUTHOR" CHAR(4000) NULLIF "AUTHOR"=0X‘5C4E‘
- )
- )
- LOCATION
- (
- ‘osch-20151215044541-2290-1‘
- )
- ) PARALLEL REJECT LIMIT UNLIMITED;
- The following location files were created.
- osch-20151215044541-2290-1 contains 1 URI, 263 bytes
- 263 hdfs://server1:8020/user/hive/warehouse/catalog/catalog.txt
运行完毕后Oracle中直接外部表就已创建完毕
- 在Oracle中执行查询Hive外部表
时间: 2024-11-04 11:29:28