FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

我们采用亚马逊emr构建的集群,用hive查询的时候报错,FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask,查看了下面的参数,挺有帮助的

Tez内存优化

1、AM、Container大小设置

tez.am.resource.memory.mb

参数说明:Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

hive.tez.container.size

参数说明:Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb.

2、AM、Container JVM参数设置

tez.am.launch.cmd-opts

默认值:80%*tez.am.resource.memory.mb

参数说明:一般不需要调整

hive.tez.java.ops

    默认值:80%*hive.tez.container.size

参数说明:Hortonworks建议“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

tez.container.max.java.heap.fraction

默认值:0.8

参数说明:task\AM占用JVM Xmx的比例,该参数建议调整,需根据具体业务情况修改;

3、Hive内存Map Join参数设置

tez.runtime.io.sort.mb

默认值:100

参数说明:输出排序需要的内存大小。建议值:40%*hive.tez.container.size,一般不超过2G;

hive.auto.convert.join.noconditionaltask

默认值:true

参数说明:是否将多个mapjoin合并为一个,使用默认值

hive.auto.convert.join.noconditionaltask.size

默认值:

参数说明:多个mapjoin转换为1个时,所有小表的文件大小总和的最大值,这个值只是限制输入的表文件的大小,并不代表实际mapjoin时hashtable的大小。 建议值:1/3* hive.tez.container.size

tez.runtime.unordered.output.buffer.size-mb

默认值:100

参数说明:Size of the buffer to use if not writing directly to disk.。 建议值:10%* hive.tez.container.size

4、Container重用设置

tez.am.container.reuse.enabled

默认值:true

参数说明:Container重用开关

Mapper/Reducer优化

1、Mapper数设置

tez.grouping.min-size

默认值:50*1024*1024

参数说明:Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

tez.grouping.max-size

默认值:1024*1024*1024

参数说明:Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

;

2、Reducer数设置

hive.tez.auto.reducer.parallelism

默认值:false

参数说明:Turn on Tez‘ autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices‘ output sizesand adjust the estimates at runtime as necessary.

建议设置为true.

hive.tex.min.partition.factor

默认值:0.25

参数说明:When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

hive.tez.max.partition.factor

默认值:2.0

参数说明:When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

hive.exec.reducers.bytes.per.reducer

默认值:256,000,000

参数说明:Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

以下公式确认Reducer个数:

Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer))x hive.tez.max.partition.factor [2]

3、Shuffle参数设置

tez.shuffle-vertex-manager.min-src-fraction

默认值:0.25

参数说明:thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

tez.shuffle-vertex-manager.max-src-fraction

默认值:0.75

参数说明:oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

 

例子:

hive.exec.reducers.bytes.per.reducer=1073741824;// 1gb

tez.shuffle-vertex-manager.min-src-fraction=0.25;

tez.shuffle-vertex-manager.max-src-fraction=0.75;

This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there‘s at least 1Gb of data being output (i.e if 25% ofmappers don‘t send 1Gb of data, we will wait till at least 1Gb is sent out).

原文地址:https://www.cnblogs.com/mobiwangyue/p/8405780.html

时间: 2024-12-10 00:36:06

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask的相关文章

解决hiveserver2报错:java.io.IOException: Job status not available - Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

用户使用的sql: select count( distinct patient_id ) from argus.table_aa000612_641cd8ce_ceff_4ea0_9b27_0a3a743f0fe3; 下面做不同的测试: 1.beeline -u jdbc:hive2://0.0.0.0:10000 -e "select count( distinct patient_id ) from argus.table_aa000612_641cd8ce_ceff_4ea0_9b27_

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

hive启动后,出现以下异常 hive> show databases; FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to centos7-01/192.168.146.135:9000 failed on connection exception: java.net.ConnectException: Connection refused) FAIL

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.

HIVE创建表时,出现以下错误: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.) 解决方案: 这是由于字符集的问题,需要配置MySQL的字符集: mysql> alter databa

hive报错 Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections,

学习hive 使用mysql作为元数据  hive创建数据库和切换数据库都是可以的 但是创建表就是出问题 百度之后发现 是编码问题 特别记录一下~~~ 1.报错前如图: 2.在mysql数据库中执行如下: 1 ALTER DATABASE hive CHARACTER SET latin1; 3.修改编码后创建数据库成功:

hive中删除表的错误Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException

1:请看操作 [[email protected] hive-0.12.0-bin]$ hive Logging initialized using configuration in jar:file:/home/jifeng/hadoop/hive-0.12.0-bin/lib/hive-common-0.12.0.jar!/hive-log4j.properties hive> show tables; OK t1 tianq tianqi Time taken: 3.338 seconds

Hive创建表格报【Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException】引发的血案

在成功启动Hive之后感慨这次终于没有出现Bug了,满怀信心地打了长长的创建表格的命令,结果现实再一次给了我一棒,报了以下的错误Error, return code 1 from org.apache.Hadoop.hive.ql.exec.DDLTask. MetaException,看了一下错误之后,先是楞了一下,接着我就发出感慨,自从踏上编程这条不归路之后,就没有一天不是在找Bug的路上就是在处理Bug,给自己贴了个标签:找Bug就跟吃饭一样的男人.抒发心中的感慨之后,该干活还是的干活.

FAILED: Execution Error, return code 2 from org.apache.hadoop

错误遇到的情形: hive整合hbase,hive的数据表 load,select,insert一切正常 通过hive往hbase关联表插入数据的时候报错,错误内容如下: 2016-04-18 14:00:34,721 Stage-0 map = 0%, reduce = 0% 2016-04-18 14:00:56,491 Stage-0 map = 100%, reduce = 0% Ended Job = job_1460958898158_0001 with errors Error d

关于HIVE做MapReduce报错:return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

大部分人没有交换分区的问题. 因为在做Linux的时候交换分区是按照内存的2倍来做的.但是我的是用VM快速装机做的.所以交换分区被设置成了等于内存. 扩展交换分区:https://blog.csdn.net/Ares_song/article/details/81203251 永久添加该文件:https://blog.csdn.net/kai_wei/article/details/53582811 另外MYSQL一个重复IP连接问题可能导致HIVE初始化实例失败,需要在所在用户执行:https

Error, return code 1 from org.apache.hadoop.hive.

Hive创建表格报[Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException]这个错误: 可能是字符集的原因,可以通过在mysql中将数据库的字符集改为latin1,执行以下命令: alter database hive character set latin1 接着在mysql中执行以下命令: drop database hive; create database hive; alter da