hive on tez配置

1、Tez简介

Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能。Tez并不直接面向最终用户——事实上它允许开发者为最终用户构建性能更快、扩展性更好的应用程序

2、编译tez

本文记录Tez 0.8.5的编译过程,之前的Tez版本都是源码包,最新的版本虽然提供了编译后的tar包,但是大部分情况下是针对特定的Hadoop版本,如果和我们的Hadoop版本不一致,可能某个时刻会出现一些未知的问题,所以为了稳定,还是建议和自己使用的Hadoop版本匹配,所以就需要编译了。

(1)解压完毕,修改根目录下的pom.xml,修改对应的Hadoop的版本。

(2)注释掉tez-ui2的子项目依赖pom,因为tez ui2编译坑比较多,可能通不过

<modules>
     <module>hadoop-shim</module>
     <module>tez-api</module>
     <module>tez-common</module>
     <module>tez-runtime-library</module>
     <module>tez-runtime-internals</module>
     <module>tez-mapreduce</module>
     <module>tez-examples</module>
     <module>tez-tests</module>
     <module>tez-dag</module>
     <module>tez-ext-service-tests</module>
     <!--
     <module>tez-ui</module>
     <module>tez-ui2</module>
     -->
     <module>tez-plugins</module>
     <module>tez-tools</module>
     <module>hadoop-shim-impls</module>
     <module>tez-dist</module>
     <module>docs</module>
   </modules>

(3)如果你是root用户编译Tez,记得修改tez-ui/pom.xml,添加允许root权限执行nodejs安装bower

<execution>
             <id>Bower install</id>
             <phase>generate-sources</phase>
             <goals>
               <goal>exec</goal>
             </goals>
             <configuration>
               <workingDirectory>${webappDir}</workingDirectory>
               <executable>${node.executable}</executable>
               <arguments>
                 <argument>node_modules/bower/bin/bower</argument>
                 <argument>install</argument>
                 <argument>--allow-root</argument>
                 <argument>--remove-unnecessary-resolutions=false</argument>
               </arguments>
             </configuration>
           </execution>

(4)注意编译的linux机器最好能fan qiang下载东西,如果不能就把根目录下的pom.xml中tez-ui也注释掉,因为不管是tez-ui还是tez-ui2都需要下载nodejs相关的东西,默认的是在墙外的,不能fan出去80%的几率会编译失败,所以如果是nodejs相关的编译失败,就把tez-ui相关的子项目都注释掉不让参与编译,这个ui没什么大的作用,就是看下job的计划,没有它也能使用Tez优化DAG依赖。

(5)能不能自己在linux上单独装nodejs,然后让tez的nodejs用本机装的那个而避免下载墙外的,经实测发现不行,tez里面的nodejs好像是单独依赖的,只要编译就会下载,最好的办法就是注释掉和tez-ui相关的东西

上面的一切搞定后,开始执行编译命令:

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

编译成功后,出现下图:

[INFO] Building jar: /mnt/apache-tez-0.8.5-src/docs/target/tez-docs-0.8.5-tests.jar

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO]

[INFO] tez ................................................ SUCCESS [01:57 min]

[INFO] hadoop-shim ........................................ SUCCESS [01:03 min]

[INFO] tez-api ............................................ SUCCESS [01:33 min]

[INFO] tez-common ......................................... SUCCESS [  4.987 s]

[INFO] tez-runtime-internals .............................. SUCCESS [  7.396 s]

[INFO] tez-runtime-library ................................ SUCCESS [ 27.988 s]

[INFO] tez-mapreduce ...................................... SUCCESS [  7.937 s]

[INFO] tez-examples ....................................... SUCCESS [  1.829 s]

[INFO] tez-dag ............................................ SUCCESS [ 34.257 s]

[INFO] tez-tests .......................................... SUCCESS [ 20.367 s]

[INFO] tez-ext-service-tests .............................. SUCCESS [  4.663 s]

[INFO] tez-plugins ........................................ SUCCESS [  0.126 s]

[INFO] tez-yarn-timeline-history .......................... SUCCESS [  2.838 s]

[INFO] tez-yarn-timeline-history-with-acls ................ SUCCESS [  1.692 s]

[INFO] tez-history-parser ................................. SUCCESS [01:31 min]

[INFO] tez-tools .......................................... SUCCESS [  0.169 s]

[INFO] tez-perf-analyzer .................................. SUCCESS [  0.090 s]

[INFO] tez-job-analyzer ................................... SUCCESS [01:19 min]

[INFO] tez-javadoc-tools .................................. SUCCESS [  0.632 s]

[INFO] hadoop-shim-impls .................................. SUCCESS [  0.203 s]

[INFO] hadoop-shim-2.6 .................................... SUCCESS [  0.688 s]

[INFO] tez-dist ........................................... SUCCESS [01:58 min]

[INFO] Tez ................................................ SUCCESS [  0.141 s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 11:27 min

[INFO] Finished at: 2017-10-29T21:01:55+08:00

[INFO] Final Memory: 73M/262M

[INFO] ------------------------------------------------------------------------

编译成功后的文件在tez-dist/target目录下:

cd /mnt/apache-tez-0.8.5-src/tez-dist/target

$ ls

archive-tmp

maven-archiver

tez-0.8.5

tez-0.8.5-minimal

tez-0.8.5-minimal.tar.gz

tez-0.8.5.tar.gz

tez-dist-0.8.5-tests.jar

3、配置hive on tez

将tez-0.8.5下所有jar包cp到hive lib/目录下,将tez-0.8.5.tar.gz上传到hdfs一个目录下:

$ /opt/cdh5/hadoop-2.6.0-cdh5.10.0/bin/hdfs dfs -mkdir -p /user/hadoop

$ /opt/cdh5/hadoop-2.6.0-cdh5.10.0/bin/hdfs dfs -put /home/hadoop/tez-0.8.5.tar.gz /user/hadoop

编辑tez配置文件etc/hadoop/tez-site.xml:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
         <property>
                 <name>tez.lib.uris</name>
                 <value>/user/hadoop/tez-0.8.5.tar.gz</value>
         </property>

</configuration>

重启hadoop集群。

set hive.execution.engine=tez;

select count(*) from t1;

注意点:

本次测试我安装了cdh5.10.0的hive,部署上述tez包,运行程序报错,具体错误见下文。

将tez部署到hive 2.1.0上运行成功,结果如下:

--测试结果:

hive (default)> set hive.execution.engine=tez;

hive (default)> select count(*) from t1;

17/11/05 21:14:54 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive

17/11/05 21:14:54 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name

Query ID = hadoop_20171105211451_0c1df9ef-c3d2-4ec9-b52b-cd5770d7b5b7

Total jobs = 1

Launching Job 1 out of 1

17/11/05 21:14:54 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

17/11/05 21:14:56 INFO client.RMProxy: Connecting to ResourceManager at db01/192.168.0.181:8032

17/11/05 21:14:56 INFO impl.YarnClientImpl: Submitted application application_1509317142960_0011

17/11/05 21:15:02 INFO client.RMProxy: Connecting to ResourceManager at db01/192.168.0.181:8032

Status: Running (Executing on YARN cluster with App id application_1509317142960_0011)

----------------------------------------------------------------------------------------------
         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

----------------------------------------------------------------------------------------------

Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0

Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0

----------------------------------------------------------------------------------------------

VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.02 s

----------------------------------------------------------------------------------------------

OK

c0

17/11/05 21:15:10 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

17/11/05 21:15:10 INFO mapred.FileInputFormat: Total input paths to process : 1

3

Time taken: 18.919 seconds, Fetched: 1 row(s)

hive (default)>

4、常见问题:

1)问题如下:

hive (default)> set hive.execution.engine=tez;

hive (default)> select * from t1 order by aa desc;

Query ID = hadoop_20171030053838_a83cb5bd-102f-4362-90b0-3fe3bcda9aa1

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1509312456681_0004)

--------------------------------------------------------------------------------
         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

--------------------------------------------------------------------------------

Map 1                 FAILED     -1          0        0       -1       0       0

Reducer 2             KILLED      1          0        0        1       0       0

--------------------------------------------------------------------------------

VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 0.24 s

--------------------------------------------------------------------------------

Status: Failed

Vertex failed, vertexName=Map 1, vertexId=vertex_1509312456681_0004_1_00, diagnostics=[Vertex vertex_1509312456681_0004_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1509312456681_0004_1_00 [Map 1], java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/MRVersion
     at org.apache.hadoop.hive.shims.Hadoop23Shims.isMR2(Hadoop23Shims.java:892)
     at org.apache.hadoop.hive.shims.Hadoop23Shims.getHadoopConfNames(Hadoop23Shims.java:963)
     at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:362)
     at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:377)
     at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
     at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:107)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
     at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.MRVersion
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
     ... 17 more

]

Vertex killed, vertexName=Reducer 2, vertexId=vertex_1509312456681_0004_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1509312456681_0004_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

解决办法:

cp /opt/cdh5/hadoop-2.6.0-cdh5.10.0/share/hadoop/mapreduce1/hadoop-core-2.6.0-mr1-cdh5.10.0.jar /opt/cdh5/hive-1.1.0-cdh5.10.0/lib/

2)问题如下:

hive (default)> set hive.execution.engine=tez;

hive (default)> select * from t1 order by aa desc;

Query ID = hadoop_20171030054343_2707c5bd-650e-4b71-89ae-cc094beafb39

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1509312456681_0005)

--------------------------------------------------------------------------------
         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

--------------------------------------------------------------------------------

Map 1                 FAILED     -1          0        0       -1       0       0

Reducer 2             KILLED      1          0        0        1       0       0

--------------------------------------------------------------------------------

VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 0.24 s

--------------------------------------------------------------------------------

Status: Failed

Vertex failed, vertexName=Map 1, vertexId=vertex_1509312456681_0005_1_00, diagnostics=[Vertex vertex_1509312456681_0005_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1509312456681_0005_1_00 [Map 1], java.lang.NoClassDefFoundError: com/esotericsoftware/kryo/Serializer
     at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:107)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
     at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
     at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.kryo.Serializer
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
     ... 12 more

]

Vertex killed, vertexName=Reducer 2, vertexId=vertex_1509312456681_0005_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1509312456681_0005_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

解决办法:暂时没找到根本原因。

时间: 2024-10-09 05:43:31

hive on tez配置的相关文章

配置 Hive On Tez

配置 Hive On Tez 标签(空格分隔): hive Tez 部署底层应用 简单介绍 介绍:tez 是基于hive 之上,可以将sql翻译解析成DAG计算的引擎.基于DAG 与mr 架构本身的优缺点,tez 本身经过测试一般小任务在hive mr 的2-3倍速度左右,大任务7-10倍左右,根据情况不同可能不一样. 对于 Tez-0.9.0 以及更高版本, Tez 需要 Apache Hadoop 版本为 2.7.0 或更高 安装 Apache Hadoop 2.7.0 或更高版本,这里选取

Hive 使用Tez引擎的配置

前提 Hive  需要是 2.0以上版本 Tez配置 下载好tar包后,将tar包上传到HDFS集群路径下 hadoop fs -put /opt/software/apache-tez-0.9.1-bin.tar.gz/ /tez 将Linux本机上的tar包解压 Hive配置 在hive/conf下新建tez-site.xml文件,并添加以下参数 <?xml version="1.0" encoding="UTF-8"?> <?xml-styl

hive on tez踩坑记1-hive0.13 on tez

最近集群准备升级到cdh5.2.0,并使用tez,在测试集群cdh5.2.0已经稳定运行了很长时间,因此开始折腾hive on tez了,期间遇到不少问题,这里记录下. hive on tez的部署比较简单,可以参考wiki.主要注意几个地方 1.编译的时候 mvn clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true 2.需要将tez相关的包upload到hdfs中,并设置tez-site.xml   <property>

hive on tez

hive运行模式 hive on mapreduce 离线计算(默认) hive on tez  YARN之上支持DAG作业的计算框架 hive on spark 内存计算 hive on tez Tez是一个构建于YARN之上的支持复杂的DAG任务的数据处理框架.它由Hontonworks开源,它把mapreduce的过程拆分成若干个子过程,同时可以把多个mapreduce任务组合成一个较大的DAG任务,减少了mapreduce之间的文件存储,同时合理组合其子过程从而大幅提升MapReduce

hive on tez 错误记录

1.执行过程失败,报 Container killed on request. Exit code is 143 如下图: 分析:造成这种原因是由于总内存不多,而容器在jvm中占比过高,修改tez-site.xml文件,添加如下配置: <property> <name>tez.container.max.java.heap.fraction</name> <value>0.2</value> #调低内存占比,默认是0.8(也就是80%) <

Hive安装与配置

Hive安装配置详解 本文主要是在Hadoop单机模式中演示Hive默认(嵌入式Derby模式)安装配置过程. 1.下载安装包 到官方网站下载最新的安装包,这里以Hive-0.12.0为例: $ tar -zxf hive-0.12.0-bin.tar.gz -C /home/ubuntu/hive-0.12.0 在这里,HIVE_HOME=" /home/ubuntu/hive-0.12.0". 2.设置环境变量 gedit /etc/profile,添加如下内容: export H

hive on tez踩坑记2-hive0.14 on tez

在测试hive0.14.0 on tez时遇到的问题比较多:1.在使用cdh5.2.0+hive0.14.0+tez-0.5.0测试时,首先遇到下面的问题 java.lang.NoSuchMethodError: org.apache.tez.dag.api.client.Progress.getFailedTaskAttemptCount()I         at org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.printStatusInPl

Hadoop系列之Hive(数据仓库)安装配置

Hadoop系列之Hive(数据仓库)安装配置1.在NameNode安装  cd /root/soft  tar zxvf apache-hive-0.13.1-bin.tar.gz   mv apache-hive-0.13.1-bin /usr/local/hadoop/hive2. 配置环境变量(每个节点都需要增加) 打开/etc/profile #添加以下内容: export HIVE_HOME=/usr/local/hadoop/hive export PATH=$HIVE_HOME/

Hive安装与配置详解

既然是详解,那么我们就不能只知道怎么安装hive了,下面从hive的基本说起,如果你了解了,那么请直接移步安装与配置 hive是什么 hive安装和配置 hive的测试 hive 这里简单说明一下,好对大家配置hive有点帮助.hive是建立在hadoop上的,当然,你如果只搭建hive也没用什么错.说简单一点,hadoop中的mapreduce调用如果面向DBA的时候,那么问题也就显现了,因为不是每个DBA都能明白mapreduce的工作原理,如果为了管理数据而需要学习一门新的技术,从现实生活