Failed to start NodeManager caused by "/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: Permission denied"



Hadoop 安装步骤:
0. 安装前准备(节点机器,环境设置,yum源设置)
1. 配置并安装Cloudera-Manager
2. 启动 CM 服务
3. 安装CDH,并配置集群
4. 启动

在启动Yarn时, NodeManager 启动失败。接下来查看对应的日志:Cloudera Manager 主页,点击YARN项操作,选择“实例”,点击角色类型,进入NodeManager 主页,在日志文件下拉框有stdout,stderr,角色日志文件,建议查看角色日志文件。

下面是角色日志文件的输出内容

Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: Permission denied
	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:181)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:245)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: Permission denied
	at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
	at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
	at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	... 5 more

  

通过网络查看不只人遇到过这样的问题

login the related node server, check the permission and ownership of those directories  /var/lib/hadoop-*,  I got the following info

[[email protected] lib]# ls -l | grep -i hadoop
d---------. 2 root         root            6 Nov 25 05:27 hadoop-hdfs
d---------. 2 root         root            6 Nov 25 05:27 hadoop-httpfs
d---------. 2 root         root            6 Nov 25 05:27 hadoop-kms
d---------. 2 root         root            6 Nov 25 05:27 hadoop-mapreduce
d---------. 3 root         root           29 Nov 25 06:44 hadoop-yarn

  

they are not right. so we need to do the following steps to fix it

chown  -R hdfs:hdfs /var/lib/hadoop-hdfs
chown  -R httpfs.httpfs /var/lib/hadoop-httpfs
chown  -R kms.kms /var/lib/hadoop-kms
chown  -R mapred:mapred /var/lib/hadoop-mapreduce
chown  -R yarn:yarn /var/lib/hadoop-yarn

chmod -R 755 /var/lib/hadoop-*

  

then try to start the node manager, it succeed.

Cheers!!!

时间: 2024-08-29 10:52:11

Failed to start NodeManager caused by "/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: Permission denied"的相关文章

Hadoop YARN: 1/1 local-dirs are bad: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers hdfs硬盘90% yarn unhealthy

1/1 local-dirs are bad: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers Node Manager logs 1 2 3 yarn.server.nodemanager.DirectoryCollection: Directory /var/lib/hadoop-yarn/cache/yarn/nm-local-dir er

ERROR: The partition with /var/lib/mysql is too full! failed!

今天一来公司,发现服务器挂掉了,然后执行日常简易操作,重启web服务器,还是不可以,然后重启mysql,结果mysql重启不了,查看日志,发现:ERROR: The partition with /var/lib/mysql is too full! failed! 于是上网搜索,发现网上也有挺多遇到这种情况,有人贴代码: cd /var rm -rf log 也就是删除日志文件,然后重启mysql /etc/init.d/mysql start 结果我的mysql还是启动不了. 查看其他更多搜

Azure Devops: COPY failed: stat /var/lib/docker/tmp/docker-builder268095359/xxxxxxx.csproj no such file or directory

在Azure Devops中部署docker镜像时,  出现COPY failed: stat /var/lib/docker/tmp/docker-builder268095359/xxxxxxx.csproj no such file or directory. Dockerfile 是用vs自动生成的, web项目没有引用其他项目的时候, docker build 是不会报错的, 但是有引用的时候就报图上这个错误了. 查了很多资料, 问了很多人, 都不知道咋回事(可能是没问到对的人吧),

java.io.IOException: Cannot run program "/opt/jdk1.8.0_191/bin/java" (in directory "/var/lib/jenkins/workspace/xinguan"): error=2, No such file or directory

测试jenkins构建,报错如下 Parsing POMs Established TCP socket on 44463 [xinguan] $ /opt/jdk1.8.0_191/bin/java -cp /var/lib/jenkins/plugins/maven-plugin/WEB-INF/lib/maven35-agent-1.13.jar:/opt/maven-3.6/boot/plexus-classworlds-2.5.2.jar:/opt/maven-3.6/conf/log

yum error: cannot open Packages database in /var/lib/rpm

1.前提条件:安装软件包的时候,被我手动终止了[[email protected] yum.repos.d]# yum clean allrpmdb: Thread/process 4541/140619363587840 failed: Thread died in Berkeley DB libraryerror: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery

error: cannot open Packages database in /var/lib/rpm

故障描述:今天下午测试OpenStack,在使用yum安装一个包的时候,手欠了下,结果被我终止了,如是有了下面的记录 先清空下缓存,发现rpmdb open failed [[email protected] glance]# yum clean all error: rpmdb: BDB0113 Thread/process 21357/140557926295360 failed: BDB1507 Thread died in Berkeley DB library error: db5 e

W: 无法下载 bzip2:/var/lib/apt/lists/partial/extras.ubuntu.com_ubuntu_dists_trusty_main_source_Sources

1 错误描述 [email protected]:~$ cd 下载 [email protected]:~/下载$ sudo apt-get update 忽略 http://cn.archive.ubuntu.com trusty InRelease 忽略 http://cn.archive.ubuntu.com trusty-updates InRelease 忽略 http://cn.archive.ubuntu.com trusty-backports InRelease 命中 http

error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)'

[[email protected] ~]#   /usr/bin/mysqladmin -u root password 'aaaaaa' /usr/bin/mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)' Check that mysqld is runni

php fpm安装curl后,nginx出现connect() to unix:/var/run/php5-fpm.sock failed (13: Permission denied)的错误

这里选择直接apt-get安装,因为比起自己编译简单多了,不需要自己配置什么 #sudo apt-get install curl libcurl3 libcurl3-dev php5-curl 安装后重启nginx #nginx -s reload 岂知出现错误,php全部不能访问,查看错误日志如下: 2014/07/24 23:59:46 [crit] 40455#0: *229072 connect() to unix:/var/run/php5-fpm.sock failed (13: