【原创】大叔经验分享(35)lzo格式支持

建表语句

CREATE EXTERNAL TABLE `my_lzo_table`(`something` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t‘
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat‘
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat‘

1 lzo

# yum install lzo lzop

手工安装:http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz

2 hadoop-lzo

# wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/hadoop-gpl-packing/hadoop-gpl-packaging-0.6.1-1.x86_64.rpm
# rpm -ivh hadoop-gpl-packaging-0.6.1-1.x86_64.rpm

# ls /opt/hadoopgpl/lib
cdh4.0.1 guava-12.0.jar hadoop-lzo-0.4.17.jar hadoop-lzo.jar pig-0.10.0 pig-0.6.0 pig-0.7.0 pig-0.8.0 protobuf-java-2.4.1.jar slf4j-api-1.5.8.jar slf4j-log4j12-1.5.10.jar yamlbeans-0.9.3.jar
# ls /opt/hadoopgpl/native/Linux-amd64-64/
libgplcompression.a libgplcompression.la libgplcompression.so libgplcompression.so.0 libgplcompression.so.0.0.0 LzoCompressor.lo LzoCompressor.o LzoDecompressor.lo LzoDecompressor.o

手工安装:https://github.com/twitter/hadoop-lzo/

3 报错

1)报错:IOException: No LZO codec found, cannot run.

core-site.xml

<property>

<name>io.compression.codecs</name>

<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>

</property>

<property>

<name>io.compression.codec.lzo.class</name>

<value>com.hadoop.compression.lzo.LzoCodec</value>

</property>

2)报错:Error: java.io.IOException: cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat

hive

export HADOOP_CLASSPATH=/opt/hadoopgpl/lib/hadoop-lzo.jar

spark

export SPARK_CLASSPATH=/opt/hadoopgpl/lib/hadoop-lzo.jar

3)报错:IOException:java.lang.RuntimeException: native-lzo library not available

hive

export JAVA_LIBRARY_PATH=/opt/hadoopgpl/native/Linux-amd64-64/

spark

export LD_LIBRARY_PATH=/opt/hadoopgpl/native/Linux-amd64-64/

4)mr报错:Error: java.io.IOException: cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:689)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

$ cp /opt/hadoopgpl/lib/hadoop-lzo.jar $HADOOP_HOME/share/hadoop/common/lib/

5)mr报错:Caused by: java.lang.RuntimeException: native-lzo library not available

mapred-site.xml

<property>

<name>mapred.child.java.opts</name>

<value>-Djava.library.path=/opt/hadoopgpl/native/Linux-amd64-64</value>

</property>

原文地址:https://www.cnblogs.com/barneywill/p/10439181.html

时间: 2024-10-26 06:57:21

【原创】大叔经验分享(35)lzo格式支持的相关文章

【原创】大叔经验分享(58)kudu写入压力大时报错

kudu写入压力大时报错 19/05/18 16:53:12 INFO AsyncKuduClient: Invalidating location fd52e4f930bc45458a8f29ed118785e3(server002:7050) for tablet 4259921cdcca4776b37771659a8cafb3: Service unavailable: Soft memory limit exceeded (at 106.05% of capacity). See htt

【原创】大叔经验分享(61)kudu rebalance报错

kudu rebalance命令报错 terminate called after throwing an instance of 'std::regex_error' what(): regex_error *** Aborted at 1558779043 (unix time) try "date -d @1558779043" if you are using GNU date *** PC: @ 0x7ff0d6cf9207 __GI_raise *** SIGABRT (@

【原创】大叔经验分享(23)hive metastore的几种部署方式

hive及其他组件(比如spark.impala等)都会依赖hive metastore,依赖的配置文件位于hive-site.xml hive metastore重要配置 hive.metastore.warehouse.dirhive2及之前版本默认为/user/hive/warehouse/,创建数据库或表时会在该目录下创建对应的目录 javax.jdo.option.ConnectionURLjavax.jdo.option.ConnectionDriverNamejavax.jdo.o

【原创】大叔经验分享(39)spark cache unpersist级联操作

问题:spark中如果有两个DataFrame(或者DataSet),DataFrameA依赖DataFrameB,并且两个DataFrame都进行了cache,将DataFrameB unpersist之后,DataFrameA的cache也会失效,官方解释如下: When invalidating a cache, we invalid other caches dependent on this cache to ensure cached data is up to date. For

【原创】大叔经验分享(52)ClouderaManager修改配置报错

Cloudera Manager中修改配置可能报错: Incorrect string value: '\xE7\xA8\x8B\xE5\xBA\x8F...' for column 'MESSAGE' at row 1 这是一个mysql的字符集问题,极有可能创建scm数据库时使用默认的latin1编码导致,涉及的表为: CREATE TABLE `REVISIONS` ( `REVISION_ID` bigint(20) NOT NULL, `OPTIMISTIC_LOCK_VERSION`

【原创】大叔经验分享(53)kudu报错unable to find SASL plugin: PLAIN

kudu安装后运行不正常,master中找不到任何tserver,查看tserver日志发现有很多报错: Failed to heartbeat to master:7051: Invalid argument: Failed to ping master at master:7051: Client connection negotiation failed: client connection to master:7051: unable to find SASL plugin: PLAIN

【原创】大叔经验分享(55)hue导出行数限制

/opt/cloudera/parcels/CDH/lib/hue/apps/beeswax/src/beeswax/conf.py # Deprecated DOWNLOAD_CELL_LIMIT = Config( key='download_cell_limit', default=10000000, type=int, help=_t('A limit to the number of cells (rows * columns) that can be downloaded from

【原创】大叔经验分享(57)hue启动coordinator时报错

hue启动coordinator时报错,页面返回undefinied错误框: 后台日志报错: runcpserver.log [13/May/2019 04:34:55 -0700] middleware INFO Processing exception: 'NoneType' object has no attribute 'is_superuser': Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.

【原创】大叔经验分享(71)docker容器中使用jvm工具

java应用中经常需要用到jvm工具来进行一些操作,如果java应用部署在docker容器中,如何使用jvm工具? 首先要看使用的docker镜像, 比如常用的openjdk镜像分为jdk和jre,只有jdk版本才有jvm工具,所以可以直接使用jdk版本的openjdk: 比如常用的tomcat镜像则没有jdk和jre选择,默认使用都是jre,所以没有jvm工具,tomcat镜像中的jdk目录如下: # ls /usr/lib/jvm/java-1.8-openjdk bin jre lib #