hive报lzo Premature EOF from inputStream错误

今天dw组同事发邮件说有一个问题让帮解决一下,他们自己没能搞得定,以下问题解决过程:

1、hql

insert overwrite table mds_prod_silent_atten_user partition (dt=20141110) select uid, host, atten_time from (select uid, host, atten_time from (select case when t2.uid is null then t1.uid else t2.uid end uid, case when t2.uid is null and t2.host is null then t1.host else t2.host end host, case when t2.atten_time is null or t1.atten_time > t2.atten_time then t1.atten_time else t2.atten_time end atten_time from (select uid, findid(extend,'uids') host, dt atten_time, sum(case when (mode = '1' or mode = '3') then 1 else -1 end) num from ods_bhv_tblog where behavior = '14000076' and dt = '20141115' and (mode = '1' or mode = '3' or mode = '2') and status = '1' group by uid,findid(extend,'uids'),dt) t1 full outer join (select uid, attened_uid host, atten_time from mds_prod_silent_atten_user where dt='20141114') t2 on t1.uid = t2.uid and t1.host = t2.host where t1.uid is null or t1.num > 0) t3 union all select t5.uid, t5.host, t5.atten_time from (select uid, host, atten_time from (select uid, findid(extend,'uids') host, dt atten_time, sum(case when (mode = '1' or mode = '3') then 1 else -1 end) num from ods_bhv_tblog where behavior = '14000076' and dt = '20141115' and (mode = '1' or mode = '3' or mode = '2') and status = '1' group by uid,findid(extend,'uids'),dt) t4 where num = 0) t5 join (select uid, attened_uid host, atten_time from mds_prod_silent_atten_user where dt='20141114') t6 on t6.uid = t5.uid and t6.host = t5.host) t7

以上是具体出错的hql,看着很复杂,其实逻辑比较简单,只涉及到两个表的关联:mds_prod_silent_atten_user和ods_bhv_tblog。

2、报错日志:

Error: java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:302)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:249)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:363)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:591)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:288)
	... 11 more
Caused by: java.io.EOFException: Premature EOF from inputStream
	at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
	at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
	at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
	at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
	at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.<init>(RCFile.java:667)
	at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1431)
	at org.apache.hadoop.hive.ql.io.RCFile$Reader.<init>(RCFile.java:1342)
	at org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileBlockMergeRecordReader.<init>(RCFileBlockMergeRecordReader.java:46)
	at org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileBlockMergeInputFormat.getRecordReader(RCFileBlockMergeInputFormat.java:38)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
	... 16 more

日志显示,在使用LZO进行压缩时出现Premature EOF from inputStream错误,该错误出现在stage-3

3、stage-3的执行计划信息如下:

Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            Union
              Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
                      name: default.mds_prod_silent_atten_user
          TableScan
            Union
              Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
                      name: default.mds_prod_silent_atten_user

stage-3只有map,没有reduce,而且map阶段只是简单的进行union,看不错有什么特殊的地方。

4、问题查找

根据lzo Premature EOF from inputStream错误信息google了一把,果然有人遇到过类似的问题,链接:

http://www.cnblogs.com/aprilrain/archive/2013/03/06/2946326.html

问题原因:

如果输出格式是TextOutputFormat,要用LzopCodec,相应的读取这个输出的格式是LzoTextInputFormat。

如果输出格式用SequenceFileOutputFormat,要用LzoCodec,相应的读取这个输出的格式是SequenceFileInputFormat。

如果输出使用SequenceFile配上LzopCodec的话,那就等着用SequenceFileInputFormat读取这个输出时收到“java.io.EOFException: Premature EOF from inputStream”吧。

以上链接对应的描述和我们这个问题有类似情况,我们的表输出格式是RCFileOutputFormat,不是普通文本,压缩编码不能用LzopCodec,应该用LzoCodec,而报错信息印证了这一点,在读取上一个job采用LzopCodec压缩生成的rcfile文件时报错。

既然找到了问题的原因,那下一步就是找对应的参数,这个参数应该是控制reduce输出压缩编码的参数,将其对应的lzo压缩编码换成LzoCodec,根据出问题job的配置信息:

果然,mapreduce.output.fileoutputformat.compress.codec选项被设置成了LzopCodec,将该选项修改mapreduce.output.fileoutputformat.compress.codec的值就行了,修改成org.apache.hadoop.io.compress.DefaultCodec,默认使用LzoCodec。

时间: 2025-01-09 09:10:13

hive报lzo Premature EOF from inputStream错误的相关文章

Premature EOF from inputStream错误的不同原因

今天例行的任务报Premature EOF from inputStream的错误(具体log被刷屏刷掉了).根据这个关键信息去网上搜了下,有说lzo压缩指定方式不对的MapReduce使用lzo压缩注意,也有说节点或者通信出问题,或者文件操作超租期,实际上就是data stream操作过程中文件被删掉了,后者的意思在后知后觉下其实已经比较接近我遇到的情况了. 因此,我先检查了集群的节点,都是正常:而lzo的能性也因为最近没有对任务做更新也就排除了. 最后一个偶然去检查了下存储的数据,发现有一个

hadoop MR 任务 报错 &quot;Error: java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io&quot;

错误原文分析 文件操作超租期,实际上就是data stream操作过程中文件被删掉了.通常是因为Mapred多个task操作同一个文件,一个task完成后删掉文件导致.这个错误跟dfs.datanode.max.transfer.threads参数到达上限有关.这个是datanode同时处理请求的任务上限,总默认值是 4096,该参数取值范围[1 to 8192] hadoop docs hdfs-site.xml dfs.datanode.max.transfer.threads 4096 S

Hive报错:Failed with exception Unable to rename

之前也安装过hive,操作过无数,也没发现什么错误,今天因为之前安装的hadoop不能用了,不知道为什么,老是提示node 0,所以重新安装了hadoop和hive.安装完测试hive创建表也没发现什么错误,但是一旦执行create table tab_name as select * from (原表)的时候报如下的错: Failed with exception Unable to rename: hdfs://hadoop:9000/tmp/hive-root/hi ve_2014-10-

hive报错( Non-Partition column appears in the partition specification)

在写及测的过程中发现的,有一些可能需要进一步验证.有时候hive报错位置不一定正确需要多确认 1 FAILED: NullPointerException null 不能用视图作为left outer join的右表 2 FAILED: UDFArgumentTypeException Only numeric or string type arguments are accepted but decimal is passed. 在cdh hive0.10中,avg的列不能是decimal类型

hive报错(1)MoveTask

今天在CDH上执行hive sql的时候报了一个错. 错误内容为: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask错误语句: INSERT OVERWRITE DIRECTORY '{$dir}' select * from tmp_analyse_os where logdata='{$begin}'  ; 错误原因: hive源码会检查导出的路径是否存在,如果不存在则报这个

xvfb启动PyQt4程序报Unable to load library icui18n错误

xvfb启动PyQt4程序报如下错误: Unable to load library icui18n "Cannot load library icui18n: (libicui18n.so.48: cannot open shared object file: No such file or directory)" 解决方法: sudo apt-get install libicu48 参考:https://forums.virtualbox.org/viewtopic.php?f=

CommVault备份RAC时0%报ORA-19506、27208、19511错误

通过CommVault备份Oracle RAC,备份状态在0%时报ORA-19506.27208.19511错误. RAC配置及备份错误信息:配置RAC实例,在"详细信息"里可以看到实例状态"打开",执行备份时"备份0%报ORA-19506.27208.19511错误", 备份作业事件错误: 收集作业日志,查看分析: 解决方法:1.确认root安装simpana安装过程中要求制定的oracle组是否是oinstall组:2.检查Oracle用户的

关于oracle数据库启动报ORA-01122,ORA-01110,ORA-01203错误的解决方法

ORACLE 数据库空间裸设备出问题了,启动oracle失败,解决方法问题现象:     启动ORACLE的时候报如下的错误:        Database mounted.      ORA-01122: database file 6 failed verification check      ORA-01110: data file 6: '/dev/raw/rlv_cbs_user_dat'      ORA-01203: wrong incarnation of this file

Tomcat_启动多个tomcat时,会报StandardServer.await: Invalid command &#39;&#39; received错误

解决方案如下:将tomcat下的server.xml文件中的端口有问题,修改规则按以下标准显示“http的端口修改为6000 to 6800之间,shutdown的端口修改为3000 to 3300之间” Tomcat_启动多个tomcat时,会报StandardServer.await: Invalid command '' received错误