【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详解

环境:

hadoop2.2.0

hive0.13.1

Ubuntu 14.04 LTS

java version "1.7.0_60"

Oracle10g



***欢迎转载,请注明来源*** 
 

http://blog.csdn.net/u010967382/article/details/38709751


到以下地址下载安装包

http://mirrors.cnnic.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz


安装包解压到服务器上

/home/fulong/Hive/apache-hive-0.13.1-bin


修改环境变量,添加以下内容

export HIVE_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin

export PATH=$HIVE_HOME/bin:$PATH


进到conf目录下拷贝模板配置文件重命名

[email protected]:~/Hive/apache-hive-0.13.1-bin/conf$ ls

hive-default.xml.template  hive-exec-log4j.properties.template

hive-env.sh.template       hive-log4j.properties.template

[email protected]:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-env.sh.template hive-env.sh

[email protected]:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-default.xml.template hive-site.xml

[email protected]:~/Hive/apache-hive-0.13.1-bin/conf$ ls

hive-default.xml.template  hive-env.sh.template                 hive-log4j.properties.template

hive-env.sh                hive-exec-log4j.properties.template  hive-site.xml


修改配置文件hive-env.sh中的以下几处,分别制定Hadoop的根目录,Hive的conf和lib目录

# Set HADOOP_HOME to point to a specific hadoop install directory

HADOOP_HOME=/home/fulong/Hadoop/hadoop-2.2.0

# Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/home/fulong/Hive/apache-hive-0.13.1-bin/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:

export HIVE_AUX_JARS_PATH=/home/fulong/Hive/apache-hive-0.13.1-bin/lib


修改配置文件hive-site.sh中的以下几处连接Oracle相关参数

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:oracle:thin:@192.168.0.138:1521:orcl</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>oracle.jdbc.driver.OracleDriver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hive</value>

<description>username to use against metastore database</description>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>hivefbi</value>

<description>password to use against metastore database</description>

</property>


配置log4j

在$HIVE_HOME下创建log4j目录,用于存储日志文件

拷贝模板重命名

[email protected]:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-log4j.properties.template hive-log4j.properties

修改存放日志的目录

hive.log.dir=/home/fulong/Hive/apache-hive-0.13.1-bin/log4j


拷贝Oracle JDBC的jar包

将对应Oracle的jdbc包拷贝到$HIVE_HOME/lib下


启动Hive

[email protected]:~/Hive/apache-hive-0.13.1-bin$ hive

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize

14/08/20 17:14:05 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

14/08/20 17:14:05 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in file:/home/fulong/Hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties

Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/fulong/Hadoop/hadoop-2.2.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.

It‘s highly recommended that you fix the library with ‘execstack -c <libfile>‘, or link it with ‘-z noexecstack‘.

hive>


验证

打算创建一张表存储搜狗实验室下载的用户搜索行为日志。

数据下载地址:

http://www.sogou.com/labs/dl/q.html

首先创建表:

hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields
terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;

此时会报错:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : ORA-01754: a table may contain
only one column of type LONG

解决办法:

用解压缩工具打开${HIVE_HOME}/lib中的hive-metastore-0.13.0.jar,发现名为package.jdo的文件,打开该文件并找到下面的内容。

<field name="viewOriginalText" default-fetch-group="false">

<column name="VIEW_ORIGINAL_TEXT" jdbc-type="LONGVARCHAR"/>

</field>

<field name="viewExpandedText" default-fetch-group="false">

<column name="VIEW_EXPANDED_TEXT" jdbc-type="LONGVARCHAR"/>

</field>

可以发现列VIEW_ORIGINAL_TEXT和VIEW_EXPANDED_TEXT的类型都为LONGVARCHAR,对应于Oracle中的LONG,这样就与Oracle表只能存在一列类型为LONG的列的要求相矛盾,所以就出现错误了。

按照Hive官网的建议将该两列的jdbc-type的值改为CLOB,修改后的内容如下所示。

<field name="viewOriginalText"default-fetch-group="false">

<column name="VIEW_ORIGINAL_TEXT" jdbc-type="CLOB"/>

</field>

<field name="viewExpandedText"default-fetch-group="false">

<column name="VIEW_EXPANDED_TEXT" jdbc-type="CLOB"/>

</field>

修改以后,重启hive。

重新执行创建表的命令,创建表成功:

hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored
as textfile;

OK

Time taken: 0.986 seconds

将本地数据加载进表中:

hive> load data local inpath ‘/home/fulong/Downloads/SogouQ.reduced‘ overwrite into table searchlog;

Copying data from file:/home/fulong/Downloads/SogouQ.reduced

Copying file: file:/home/fulong/Downloads/SogouQ.reduced

Loading data to table default.searchlog

rmr: DEPRECATED: Please use ‘rm -r‘ instead.

Deleted hdfs://fulonghadoop/user/hive/warehouse/searchlog

Table default.searchlog stats: [numFiles=1, numRows=0, totalSize=152006060, rawDataSize=0]

OK

Time taken: 25.705 seconds

查看所有表:

hive> show tables;

OK

searchlog

Time taken: 0.139 seconds, Fetched: 1 row(s)

统计行数:

hive> select count(*) from searchlog;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1407233914535_0001, Tracking URL = http://FBI003:8088/proxy/application_1407233914535_0001/

Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1407233914535_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2014-08-20 18:03:17,667 Stage-1 map = 0%,  reduce = 0%

2014-08-20 18:04:05,426 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.46 sec

2014-08-20 18:04:27,317 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.74 sec

MapReduce Total cumulative CPU time: 4 seconds 740 msec

Ended Job = job_1407233914535_0001

MapReduce Jobs Launched:

Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.74 sec   HDFS Read: 152010455 HDFS Write: 8 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 740 msec

OK

1724264

Time taken: 103.154 seconds, Fetched: 1 row(s)

【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详解

时间: 2024-12-14 18:06:18

【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详解的相关文章

【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详细解释

环境: hadoop2.2.0 hive0.13.1 Ubuntu 14.04 LTS java version "1.7.0_60" Oracle10g ***欢迎转载.请注明来源***    http://blog.csdn.net/u010967382/article/details/38709751 到下面地址下载安装包 http://mirrors.cnnic.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz 安装包解压到

【甘道夫】Hive0.13.1图形界面HWI尝鲜

引言 HIVE的操作接口除了常用的CLI之外,还有图形界面HWI,今天尝试了下HWI,特此记录供以后参考. 过程 apache-hive-0.13.1-bin.tar.gz 的 lib 目录默认不包含 hive-hwi-0.13.1.war,我们需要到源码包中获取HWI. (1)将源码包 apache-hive-0.13.1-src.tar.gz 下载到本地 (2)将源码包中hwi/web目录下的内容打包成war [email protected]:~/Downloads/apache-hive

【甘道夫】Apache Hadoop 2.5.0-cdh5.2.0 HDFS Quotas 配额控制

前言 HDFS为管理员提供了针对文件夹的配额控制特性,能够控制名称配额(指定文件夹下的文件&文件夹总数),或者空间配额(占用磁盘空间的上限). 本文探究了HDFS的配额控制特性,记录了各类配额控制场景的实验具体过程. 实验环境基于Apache Hadoop 2.5.0-cdh5.2.0. 欢迎转载,请注明出处:http://blog.csdn.net/u010967382/article/details/44452485 名称配额功能试用 设置名称配额,即当前文件夹下文件和文件夹的最大数量: [

【甘道夫】Win7x64环境下编译Apache Hadoop2.2.0的Eclipse小工具

目标: 编译Apache Hadoop2.2.0在win7x64环境下的Eclipse插件 环境: win7x64家庭普通版 eclipse-jee-kepler-SR1-win32-x86_64.zip Apache Ant(TM) version 1.8.4 compiled on May 22 2012 java version "1.7.0_45" 參考文章: http://kangfoo.u.qiniudn.com/article/2013/12/build-hadoop2x

【甘道夫】Win7x64环境下编译Apache Hadoop2.2.0的Eclipse插件

目标: 编译Apache Hadoop2.2.0在win7x64环境下的Eclipse插件 环境: win7x64家庭普通版 eclipse-jee-kepler-SR1-win32-x86_64.zip Apache Ant(TM) version 1.8.4 compiled on May 22 2012 java version "1.7.0_45" 参考文章: http://kangfoo.u.qiniudn.com/article/2013/12/build-hadoop2x

【甘道夫】Sqoop1.4.4 实现将 Oracle10g 中的增量数据导入 Hive0.13.1 ,并更新Hive中的主表

需求 将Oracle中的业务基础表增量数据导入Hive中,与当前的全量表合并为最新的全量表. ***欢迎转载,请注明来源***    http://blog.csdn.net/u010967382/article/details/38735381 设计 涉及的三张表: 全量表:保存了截止上一次同步时间的全量基础数据表 增量表:增量临时表 更新后的全量表:更新后的全量数据表 步骤: 通过Sqoop将Oracle中的表导入Hive,模拟全量表和增量表 通过Hive将"全量表+增量表"合并为

【甘道夫】实现Hive数据同步更新的shell脚本

引言: 上一篇文章<[甘道夫]Sqoop1.4.4 实现将 Oracle10g 中的增量数据导入 Hive0.13.1 ,并更新Hive中的主表>http://blog.csdn.net/u010967382/article/details/38735381 描述了增量更新Hive表的原理和Sqoop,Hive命令,本文基于上一篇文章的内容实现了shell脚本的编写,稍加修改就可用于实际工程. ***欢迎转载,请注明来源***    http://blog.csdn.net/u01096738

【甘道夫】用贝叶斯文本分类测试打过1329-3.patch的Mahout0.9 on Hadoop2.2.0

引言 接前一篇文章<[甘道夫]Mahout0.9 打patch使其支持 Hadoop2.2.0> http://blog.csdn.net/u010967382/article/details/39088035, 为Mahout0.9打过Patch编译成功后,使用贝叶斯文本分类来测试Mahout0.9对Hadoop2.2.0的兼容性. 欢迎转载,转载请注明出处: http://blog.csdn.net/u010967382/article/details/39088285 步骤一:将20ne

【甘道夫】Hadoop2.2.0环境使用Sqoop-1.4.4将Oracle11g数据导入HBase0.96,并自动生成组合行键

目的: 使用Sqoop将Oracle中的数据导入到HBase中,并自动生成组合行键! 环境: Hadoop2.2.0 Hbase0.96 sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz Oracle11g jdk1.7 Ubuntu14 Server 这里关于环境吐槽一句: 最新版本的Sqoop1.99.3功能太弱,只支持导入数据到HDFS,没有别的任何选项,太土了!(如有不同意见欢迎讨论给出解决方案) 命令: sqoop import --connect