(转)yarn 集群部署,遇到的问题小结

link:http://blog.csdn.net/uniquechao/article/details/26449761

版本信息: hadoop 2.3.0  hive 0.11.0

1. Application Master 无法访问

点击application mater 链接,出现 http 500
错误,java.lang.Connect.exception:

问题是由于设定web ui时,50030 端口对应的ip地址为0.0.0.0,导致application
master 链接无法定位。

解决办法:

yarn-site.xml 文件

<property>

<description>The address of the RM web
application.</description>

<name>yarn.resourcemanager.webapp.address</name>

<value>xxxxxxxxxx:50030</value>

</property>

这是2.3.0 的里面的一个bug 1811 ,2.4.0已经修复

2. History UI 无法访问 和 container 打不开

点击 Tracking URL:History无法访问

问题是 history service 没有启动

解决办法:

配置:选择(xxxxxxxxxx: 作为history
sever)

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>xxxxxxxxxx::10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>xxxxxxxxxx:19888</value>

</property>

sbin/mr-jobhistory-daemon.sh   

start historyserver

相关链接:http://www.iteblog.com/archives/936

3 yarn 平台的优化

设置 虚拟cpu的个数

<property>

<name>yarn.nodemanager.resource.cpu-vcores</name>

<value>23</value>

</property>

设置使用的内存

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>61440</value>

<description>the amount of memory on the NodeManager in GB</description>

</property>

设置每个任务最大使用的内存

<property>

<name>yarn.scheduler.maximum-allocation-mb</name>

<value>49152</value>

<source>yarn-default.xml</source>
    </property>

4 运行任务 提示: Found interface
org.apache.hadoop.mapreduce.Counter, but class was expected

修改pom,重新install

<dependency>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-common</artifactId>

<version>2.3.0</version>

</dependency>

<dependency>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-mapreduce-client-core</artifactId>

<version>2.3.0</version>
 
 </dependency>

<dependency>

<groupId>org.apache.mrunit</groupId>

<artifactId>mrunit</artifactId>

<version>1.0.0</version>

<classifier>hadoop2</classifier>

<scope>test</scope>

</dependency>

jdk 换成1.7

5 运行任务提示shuffle内存溢出Java heap
space

2014-05-14 16:44:22,010 FATAL [IPC Server handler 4 on 44508]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1400048775904_0006_r_000004_0 - exited :
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle
in fetcher#3

at
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)

at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)

at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native
Method)

at
javax.security.auth.Subject.doAs(Subject.java:415)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Caused by: java.lang.OutOfMemoryError: Java heap space

at
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)

at
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)

at
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)

at
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)

at
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)

at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:411)

at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)

at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)

来源: <http:/xxxxxxxxxx:19888/jobhistory/logs/ST-L09-05-back-tj-yarn15:8034/container_1400048775904_0006_01_000001/job_1400048775904_0006/hadoop/syslog/?start=0>

解决方法:

调低mapreduce.reduce.shuffle.memory.limit.percent的值
默认为0.25 现在调成0.10

参考:

http://www.sqlparty.com/yarn%E5%9C%A8shuffle%E9%98%B6%E6%AE%B5%E5%86%85%E5%AD%98%E4%B8%8D%E8%B6%B3%E9%97%AE%E9%A2%98error-in-shuffle-in-fetcher/

6 reduce 任务的log 中间发现:

2014-05-14 17:51:21,835 WARN [Readahead Thread #2]
org.apache.hadoop.io.ReadaheadPool: Failed readahead on ifile

EINVAL: Invalid argument

at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)

at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:263)

at
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:142)

at
org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

来源: <http://xxxxxxxxxx:8042/node/containerlogs/container_1400060792764_0001_01_000726/hadoop/syslog/?start=-4096>

ps:
错误没有再现,暂无解决方法

7 hive 任务

java.lang.InstantiationException: org.antlr.runtime.CommonToken

Continuing ...

java.lang.RuntimeException: failed to evaluate:
<unbound>=Class.new();

参考:https://issues.apache.org/jira/browse/HIVE-4222s

8 hive
任务自动把join装换mapjoin时内存溢出,解决方法:关闭自动装换,11前的版本默认值为false,后面的为true;

在任务脚本里面加上:set
hive.auto.convert.join=false;

或者在hive-site.xml 配上为false;

出错日志:

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2014-05-15 02:40:58     Starting to launch local task to process
map join;      maximum memory = 1011351552

2014-05-15 02:41:00     Processing rows:      
 200000  Hashtable size: 199999  Memory usage:   110092544
      rate:   0.109

2014-05-15 02:41:01     Processing rows:      
 300000  Hashtable size: 299999  Memory usage:   229345424
      rate:   0.227

2014-05-15 02:41:01     Processing rows:      
 400000  Hashtable size: 399999  Memory usage:   170296368
      rate:   0.168

2014-05-15 02:41:01     Processing rows:      
 500000  Hashtable size: 499999  Memory usage:   285961568
      rate:   0.283

2014-05-15 02:41:02     Processing rows:      
 600000  Hashtable size: 599999  Memory usage:   408727616
      rate:   0.404

2014-05-15 02:41:02     Processing rows:      
 700000  Hashtable size: 699999  Memory usage:   333867920
      rate:   0.33

2014-05-15 02:41:02     Processing rows:      
 800000  Hashtable size: 799999  Memory usage:   459541208
      rate:   0.454

2014-05-15 02:41:03     Processing rows:      
 900000  Hashtable size: 899999  Memory usage:   391524456
      rate:   0.387

2014-05-15 02:41:03     Processing rows:      
 1000000 Hashtable size: 999999  Memory usage:   514140152  
    rate:   0.508

2014-05-15 02:41:03     Processing rows:      
 1029052 Hashtable size: 1029052 Memory usage:   546126888  
    rate:   0.54

2014-05-15 02:41:03     Dump the hashtable into file:
file:/tmp/hadoop/hive_2014-05-15_14-40-53_413_3806680380261480764/-local-10002/HashTable-Stage-4/MapJoin-mapfile01--.hashtable

2014-05-15 02:41:06     Upload 1 File to:
file:/tmp/hadoop/hive_2014-05-15_14-40-53_413_3806680380261480764/-local-10002/HashTable-Stage-4/MapJoin-mapfile01--.hashtable
File size: 68300588

2014-05-15 02:41:06     End of local task; Time Taken: 8.301
sec.

Execution completed successfully

Mapred Local Task Succeeded . Convert the Join into
MapJoin

Mapred Local Task Succeeded . Convert the Join into MapJoin

Launching Job 2 out of 2

log出错日志:

2014-05-15 13:52:54,007 FATAL [main] org.apache.hadoop.mapred.YarnChild:
Error running child : java.lang.OutOfMemoryError: Java heap space

at
java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3465)

at
java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)

at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)

at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

at java.util.HashMap.readObject(HashMap.java:1183)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at
java.lang.reflect.Method.invoke(Method.java:606)

at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)

at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)

at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

at
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.initilizePersistentHash(HashMapWrapper.java:128)

at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)

at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)

at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)

at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)

来源: <http://xxxxxxxxxx:19888/jobhistory/logs/ST-L09-10-back-tj-yarn21:8034/container_1400064445468_0013_01_000002/attempt_1400064445468_0013_m_000000_0/hadoop/syslog/?start=0>

9 hive运行时 提示:failed to evaluate:
<unbound>=Class.new(); ,升级到0.13.0

参考https://issues.apache.org/jira/browse/HIVE-4222

https://issues.apache.org/jira/browse/HIVE-3739

SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J:
Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]OKTime taken: 2.28
secondsjava.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing
...java.lang.RuntimeException: failed to evaluate:
<unbound>=Class.new();Continuing ...java.lang.InstantiationException:
org.antlr.runtime.CommonTokenContinuing ...java.lang.RuntimeException: failed to
evaluate: <unbound>=Class.new();Continuing
...java.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing
...java.lang.RuntimeException: failed to evaluate:
<unbound>=Class.new();Continuing ...java.lang.InstantiationException:
org.antlr.runtime.CommonTokenContinuing ...java.lang.RuntimeException: failed to
evaluate: <unbound>=Class.new();Continuing
...java.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing
...


这个应该升级后能解决,不过不知道为什么我升级12.0 和13.0
,一运行就报错fileNotfundHIVE_PLANxxxxxxxxx 。ps
(参考11)应该是我配置有问题,暂无解决方法。




10 hive
创建表或者数据库的时候 Couldnt obtain a new sequence
(unique id) : You have an error in your SQL syntax

解决方法:这个是因为hive元数据库的名字是yarn-hive,
sql中中划线是关键词,所以sql错误。把数据库名去掉中划线,问题解决。

错误日志:

FAILED: Error in metadata: MetaException(message:javax.jdo.JDOException:
Couldnt obtain a new sequence (unique id) : You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version for the
right syntax to use near ‘-hive.`SEQUENCE_TABLE` WHERE
`SEQUENCE_NAME`=‘org.apache.hadoop.hive.metastore.m‘ at line 1

at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)

at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)

at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)

at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:643)

at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at
java.lang.reflect.Method.invoke(Method.java:606)

at
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)

at com.sun.proxy.$Proxy14.createTable(Unknown
Source)

at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1070)

at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1103)

at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at
java.lang.reflect.Method.invoke(Method.java:606)

at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)

at
com.sun.proxy.$Proxy15.create_table_with_environment_context(Unknown
Source)

at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:466)

at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:455)

at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at
java.lang.reflect.Method.invoke(Method.java:606)

at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)

at com.sun.proxy.$Proxy16.createTable(Unknown
Source)

at
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:597)

at
org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3777)

at
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:256)

at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144)

at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1362)

at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1146)

at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)

at
shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)

at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)

at
shark.SharkCliDriver$.main(SharkCliDriver.scala:235)

at
shark.SharkCliDriver.main(SharkCliDriver.scala)

NestedThrowablesStackTrace:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an
error in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near ‘-hive.`SEQUENCE_TABLE` WHERE
`SEQUENCE_NAME`=‘org.apache.hadoop.hive.metastore.m‘ at line 1

at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at
java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at
com.mysql.jdbc.Util.handleNewInstance(Util.java:406)

at
com.mysql.jdbc.Util.getInstance(Util.java:381)

at
com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1030)

at
com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)

at
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3558)

at
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3490)

at
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)

at
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)

at
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2648)

at
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2077)

at
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2228)

at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)

at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)

at
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:381)

at
org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:504)

at
org.datanucleus.store.rdbms.valuegenerator.SequenceTable.getNextVal(SequenceTable.java:197)

at
org.datanucleus.store.rdbms.valuegenerator.TableGenerator.reserveBlock(TableGenerator.java:190)

at
org.datanucleus.store.valuegenerator.AbstractGenerator.reserveBlock(AbstractGenerator.java:305)

at
org.datanucleus.store.rdbms.valuegenerator.AbstractRDBMSGenerator.obtainGenerationBlock(AbstractRDBMSGenerator.java:170)

at
org.datanucleus.store.valuegenerator.AbstractGenerator.obtainGenerationBlock(AbstractGenerator.java:197)

at
org.datanucleus.store.valuegenerator.AbstractGenerator.next(AbstractGenerator.java:105)

at
org.datanucleus.store.rdbms.RDBMSStoreManager.getStrategyValueForGenerator(RDBMSStoreManager.java:2019)

at
org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1385)

at
org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3727)

at
org.datanucleus.state.JDOStateManager.setIdentity(JDOStateManager.java:2574)

at
org.datanucleus.state.JDOStateManager.initialiseForPersistentNew(JDOStateManager.java:526)

at
org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:202)

at
org.datanucleus.ExecutionContextImpl.newObjectProviderForPersistentNew(ExecutionContextImpl.java:1326)

at
org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2123)

at
org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1972)

at
org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1820)

at
org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)

at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:727)

at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)

at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:643)

11 安装hive 12 和13
后,运行任务报错提示:FileNotFoundException: HIVE_PLAN

解决方法:可能是hive一个bug,也可能那里配置错了
,待解决

错误日志

2014-05-16 10:27:07,896 INFO [main] org.apache.hadoop.mapred.MapTask:
Processing split:
Paths:/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000000_0:201326592+60792998,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000001_0_copy_1:201326592+58503492,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000001_0_copy_2:67108864+67108864,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000001_0_copy_2:134217728+67108864,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000002_0_copy_1:67108864+67108864InputFormatClass:
org.apache.hadoop.mapred.TextInputFormat

2014-05-16 10:27:07,954 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.RuntimeException:
java.io.FileNotFoundException: HIVE_PLAN14c8af69-0156-4633-9273-6a812eb91a4c
(没有那个文件或目录)

at
org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:230)

at
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)

at
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:381)

at
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)

at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)

at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)

at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)

at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native
Method)

at
javax.security.auth.Subject.doAs(Subject.java:415)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Caused by: java.io.FileNotFoundException:
HIVE_PLAN14c8af69-0156-4633-9273-6a812eb91a4c (没有那个文件或目录)

at java.io.FileInputStream.open(Native Method)

at
java.io.FileInputStream.<init>(FileInputStream.java:146)

at
java.io.FileInputStream.<init>(FileInputStream.java:101)

at
org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:221)

... 12 more

2014-05-16 10:27:07,957 INFO [main] org.apache.hadoop.mapred.Task: Runnning
cleanup for the task

来源: <http://sxxxxxxxxxx:19888/jobhistory/logs/ST-L10-10-back-tj-yarn10:8034/container_1400136017046_0026_01_000030/attempt_1400136017046_0026_m_000000_0/hadoop>

12java.lang.OutOfMemoryError: GC overhead
limit exceeded

分析:这个是JDK6新添的错误类型。是发生在GC占用大量时间为释放很小空间的时候发生的,是一种保护机制。解决方案是,关闭该功能,可以添加JVM的启动参数来限制使用内存:
-XX:-UseGCOverheadLimit 
添加位置是:mapred-site.xml
里新增项:mapred.child.java.opts 内容:-XX:-UseGCOverheadLimit

来源: <http://www.cnblogs.com/niocai/archive/2012/07/31/2616252.html>

参考14

13hive   hive
0.10.0为了执行效率考虑,简单的查询,就是只是select,不带count,sum,group
by这样的,都不走map/reduce,直接读取hdfs文件进行filter过滤。这样做的好处就是不新开mr任务,执行效率要提高不少,但是不好的地方就是用户界面不友好,有时候数据量大还是要等很长时间,但是又没有任何返回。

改这个很简单,在hive-site.xml里面有个配置参数叫

hive.fetch.task.conversion

将这个参数设置为more,简单查询就不走map/reduce了,设置为minimal,就任何简单select都会走map/reduce。

来源: <http://slaytanic.blog.51cto.com/2057708/1170431>
 参考14

14 运行mr 任务的时候提示:

错误日志

[java] view plaincopy

  1. Container [pid=30486,containerID=container_1400229396615_0011_01_000012] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.7 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1400229396615_0011_01_000012 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 30501 30486 30486 30486 (java) 3924 322 1720471552 262096 /opt/jdk1.7.0_55/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -XX:-UseGCOverheadLimit -Djava.io.tmpdir=/home/nodemanager/local/usercache/hadoop/appcache/application_1400229396615_0011/container_1400229396615_0011_01_000012/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 30.30.30.39 47925 attempt_1400229396615_0011_m_000000_0 12 |- 30486 12812 30486 30486 (bash) 0 0 108642304 302 /bin/bash -c /opt/jdk1.7.0_55/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -XX:-UseGCOverheadLimit -Djava.io.tmpdir=/home/nodemanager/local/usercache/hadoop/appcache/application_1400229396615_0011/container_1400229396615_0011_01_000012/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 30.30.30.39 47925 attempt_1400229396615_0011_m_000000_0 12 1>/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012/stdout 2>/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

来源: <http://xxxxxxxxxx:50030/proxy/application_1400229396615_0011/mapreduce/attempts/job_1400229396615_0011/m/FAILED>

解决方法:

下面的参数是关于mapreduce任务运行时的内存设置,如果有的任务需要可单独配置,就统一配置了。如果有container被kill
可以适当调高

mapreduce.map.memory.mb    map任务的最大内存

mapreduce.map.java.opts -Xmx1024M map任务jvm的参数

mapreduce.reduce.memory.mb  reduce任务的最大内存

mapreduce.reduce.java.opts -Xmx2560M reduce任务jvm的参数

mapreduce.task.io.sort.mb 512 Higher memory-limit while sorting data for
efficiency.

摘自:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_the_Hadoop_Daemons_in_Non-Secure_Mode

关闭内存检测进程:

是在搞不清楚 问什么有的任务就物理内存200多MB
,虚拟内存就飙到2.7G了,估计内存检测进程有问题,而且我有的任务是需要大内存的,为了进度,索性关了,一下子解决所有内存问题。

yarn.nodemanager.pmem-check-enabled false

yarn.nodemanager.vmem-check-enabled false

15 yarn 的webUI 有关的调整:

1 cluser 页面 application的starttime 和finishtime 都是 UTC格式,改成
+8区时间也就是北京时间。

./share/hadoop/yarn/hadoop-yarn-common-2.3.0.jar
里面的webapps.static.yarn.dt.plugins.js

或者源码包里面:/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/yarn.dt.plugins.js

添加代码:

[java] view plaincopy

  1. Date.prototype.Format = function (fmt) { //author: meizz

  2. var o = {

  3. "M+": this.getMonth() + 1, //月份

  4. "d+": this.getDate(), //日

  5. "h+": this.getHours(), //小时

  6. "m+": this.getMinutes(), //分

  7. "s+": this.getSeconds(), //秒

  8. "q+": Math.floor((this.getMonth() + 3) / 3), //季度

  9. "S": this.getMilliseconds() //毫秒

  10. };

  11. if (/(y+)/.test(fmt)) fmt = fmt.replace(RegExp.$1, (this.getFullYear() + "").substr(4 - RegExp.$1.length));

  12. for (var k in o)

  13. if (new RegExp("(" + k + ")").test(fmt)) fmt = fmt.replace(RegExp.$1, (RegExp.$1.length == 1) ? (o[k]) : (("00" + o[k]).substr(("" + o[k]).length)));

  14. return fmt;

  15. };

同时按下面修改下的代码

[java] view plaincopy

  1. function renderHadoopDate(data, type, full)

  2. { if (type === ‘display‘ || type === ‘filter‘) { if(data === ‘0‘) { return "N/A"; }

  3. return new Date(parseInt(data)).Format("yyyy-MM-dd hh:mm:ss"); }

16  MR1的任务用到DistributedCache
的任务迁移到MR2上出错。原来我里面使用文件名区分不同的缓存文件,MR2里面分发文件以后只保留的文件名如:

[java] view plaincopy

  1. application_xxxxxxx/container_14xxxx/part-m-00000

  2. application_xxxxxxx/container_14xxxx/part-m-00001

  3. application_xxxxxxx/container_14xxxx/00000_0

解决方法:每个缓存文件添加符号链接,链接为 父级名字+文件名

[java] view plaincopy

  1. DistributedCache.addCacheFile(new URI(path.toString() + "#"+ path.getParent().getName() + "_" + path.getName()),

  2. configuration);

这样就会生成带有文件名的缓存文件

时间: 2024-11-09 06:29:14

(转)yarn 集群部署,遇到的问题小结的相关文章

yarn 集群部署,遇到的问题小结

有没有好的python UML建模工具?求推荐,除eclipse的插件(因为不喜欢用eclipse).pyNsource用的不是很好,pyUt不全.有没StarUML上的python插件? import abc class AbstractEnemyFactory( object ): __metaclass__ = abc.ABCMeta @abc.abstractmethod def createNinja( self ): pass @abc.abstractmethod def crea

大数据【三】YARN集群部署

一 概述 YARN是一个资源管理.任务调度的框架,采用master/slave架构,主要包含三大模块:ResourceManager(RM).NodeManager(NM).ApplicationMaster(AM). >ResourceManager负责所有资源的监控.分配和管理,运行在主节点: >NodeManager负责每一个节点的维护,运行在从节点: >ApplicationMaster负责每一个具体应用程序的调度和协调,只有在有任务正在执行时存在. 对于所有的applicati

超详细从零记录Hadoop2.7.3完全分布式集群部署过程

超详细从零记录Ubuntu16.04.1 3台服务器上Hadoop2.7.3完全分布式集群部署过程.包含,Ubuntu服务器创建.远程工具连接配置.Ubuntu服务器配置.Hadoop文件配置.Hadoop格式化.启动.(首更时间2016年10月27日) 主机名/hostname IP 角色 hadoop1 192.168.193.131 ResourceManager/NameNode/SecondaryNameNode hadoop2 192.168.193.132 NodeManager/

集群部署

一. 软件版本信息.......................................................................................................... 1 二. 集群分布信息.......................................................................................................... 2 三. 虚拟机固定ip....

HDFS集群和YARN集群

Hadoop集群环境搭建(一) 1集群简介 HADOOP集群具体来说包含两个集群:HDFS集群和YARN集群,两者逻辑上分离,但物理上常在一起 HDFS集群: 负责海量数据的存储,集群中的角色主要有 NameNode / DataNode YARN集群: 负责海量数据运算时的资源调度,集群中的角色主要有 ResourceManager /NodeManager 本集群搭建案例,以3节点为例进行搭建,角色分配如下: hdp-node-01 NameNode SecondaryNameNode Re

Hadoop集群部署实战

Hadoop 集群搭建 目录 集群简介 服务器准备 环境和服务器设置 JDK环境安装 Hadoop安装部署 启动集群 测试 集群简介 在进行集群搭建前,我们需要大概知道搭建的集群都是些啥玩意. HADOOP集群具体来说包含两个集群:HDFS集群和YARN集群,两者在逻辑上分离,但物理上常在一起(啥意思?就是说:HDFS集群和YARN集群,他们俩是两个不同的玩意,但很多时候都会部署在同一台物理机器上) HDFS集群:负责海量数据的存储,集群中的角色主要有 NameNode (DataNode的管理

hbase 集群部署

Hhase 集群部署 使用的软件 hadoop-2.7.4 hbase-1.2.6 jdk-8u144 zookeeper-3.4.10 Hbase 自带的有zookeeper,在这里使用自己部署的zookeeper zookeeper 集群部署 安装jdk 下载zookeeper 程序 修改zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataLogDir=/zookeeper/logs dataDir=/zookeeper/data clien

Spark概述及集群部署

Spark概述 什么是Spark (官网:http://spark.apache.org) Spark是一种快速.通用.可扩展的大数据分析引擎,2009年诞生于加州大学伯克利分校AMPLab,2010年开源,2013年6月成为Apache孵化项目,2014年2月成为Apache顶级项目.目前,Spark生态系统已经发展成为一个包含多个子项目的集合,其中包含SparkSQL.Spark Streaming.GraphX.MLlib等子项目,Spark是基于内存计算的大数据并行计算框架.Spark基

Spark的介绍和集群部署

介绍 1.spark处理大数据的统一分析计算引擎: a.速度:在迭代循环的计算模型下,spark比Hadoop快100倍: b.易用性:spark提供多种语言的API,如Java.Python.Scala.R.SQL等 c.扩展性:在spark RDD基础上,提供一整套的分析计算模型:spark SQL.spark Stresaming.spark MLLib和图计算: d.运行: spark支持在hadoop.Hadoop, Apache Mesos, Kubernetes, standalo