hdfs directory item limit - (dfs.namenode.fs-limits.max-directory-items)

18/11/05 22:21:44 WARN scheduler.TaskSetManager: Lost task 186792.0 in stage 0.0 (TID 1046119, emr-worker-343.cluster-40699, executor 324): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /public/sample/big_scale_fm/test_data_50_billion_sorted/_temporary/0/ is exceeded: limit=1048576 items=1048576
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:932)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:117)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:189)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:474)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:73)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3650)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:867)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:575)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy15.rename(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:487)
    at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy16.rename(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1951)
    at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:626)
    at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:531)
    at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
    at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343)
    at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
    at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
    at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1219)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1197)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

18/11/05 22:21:44 INFO scheduler.TaskSetManager: Starting task 186792.1 in stage 0.0 (TID 1050576, emr-worker-801.cluster-40699, executor 158, partition 186792, RACK_LOCAL, 6062 bytes)
18/11/05 22:21:44 INFO scheduler.TaskSetManager: Lost task 197623.0 in stage 0.0 (TID 1048044) on emr-worker-801.cluster-40699, executor 158: org.apache.hadoop.ipc.RemoteException (The directory item limit of /public/sample/big_scale_fm/test_data_50_billion_sorted/_temporary/0/ is exceeded: limit=1048576 items=1048576
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:932)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:117)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:189)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:474)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:73)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3650)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:867)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:575)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
) [duplicate 1]
18/11/05 22:21:44 INFO scheduler.TaskSetManager: Starting task 197623.1 in stage 0.0 (TID 1050577, emr-worker-176.cluster-40699, executor 201, partition 197623, RACK_LOCAL, 6063 bytes)
18/11/05 22:21:44 INFO scheduler.TaskSetManager: Lost task 198559.0 in stage 0.0 (TID 1048304) on emr-worker-176.cluster-40699, executor 201: org.apache.hadoop.ipc.RemoteException (The directory item limit of /public/sample/big_scale_fm/test_data_50_billion_sorted/_temporary/0/ is exceeded: limit=1048576 items=1048576
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:932)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:117)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:189)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:474)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:73)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3650)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:867)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:575)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
) [duplicate 2]
18/11/05 22:22:15 INFO scheduler.TaskSetManager: Lost task 203616.0 in stage 0.0 (TID 1049710) on emr-worker-412.cluster-40699, executor 228: org.apache.hadoop.ipc.RemoteException (The directory item limit of /public/sample/big_scale_fm/test_data_50_billion_sorted/_temporary/0/ is exceeded: limit=1048576 items=1048576
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:932)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:117)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:189)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:474)
    at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:73)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3650)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:867)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:575)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
) [duplicate 23]

错误原因: 一个目录下的子目录数量达到限制,默认的上限是 1048576。 每次执行hive语句时,都会在 hive.exec.scratchdir 配置的目录下生成一个临时目录,执行结束后会自动删除,如果异常中断则有可能会保留目录。

提供两种解决办法:

提高目录的限制上限,通过 dfs.namenode.fs-limits.max-directory-items 参数限制,最高不能超过 6400000

<property>
  <name>dfs.namenode.fs-limits.max-directory-items</name>
  <value>1048576</value>
  <description>Defines the maximum number of items that a directory may contain. Cannot set the property to a value less than 1 or more than 6400000.</description>
</property>

比较靠谱的方法还是在作业中,适当做一下控制。

原文地址:https://www.cnblogs.com/suanec/p/9914495.html

时间: 2024-10-10 16:41:13

hdfs directory item limit - (dfs.namenode.fs-limits.max-directory-items)的相关文章

Hadoop HDFS: the directory item limit is exceed: limit=1048576问题的解决

问题描述: 1.文件无法写入hadoop hdfs文件系统: 2.hadoop namenode日志记录 the directory item limit is exceed: limit=1048576 3.hadoop单个目录下文件超1048576个,默认limit限制数为1048576,所以要调大limit限制数 解决办法: hdfs-site.xml配置文件添加配置参数 <property>   <name>dfs.namenode.fs-limits.max-direct

大数据技术之_04_Hadoop学习_01_HDFS_HDFS概述+HDFS的Shell操作(开发重点)+HDFS客户端操作(开发重点)+HDFS的数据流(面试重点)+NameNode和SecondaryNameNode(面试开发重点)

第1章 HDFS概述1.1 HDFS产出背景及定义1.2 HDFS优缺点1.3 HDFS组成架构1.4 HDFS文件块大小(面试重点)第2章 HDFS的Shell操作(开发重点)第3章 HDFS客户端操作(开发重点)3.1 HDFS客户端环境准备3.2 HDFS的API操作3.2.1 HDFS文件上传(测试参数优先级)3.2.2 HDFS文件下载3.2.3 HDFS文件夹删除3.2.4 HDFS文件名更改3.2.5 HDFS文件详情查看3.2.6 HDFS文件和文件夹判断3.3 HDFS的I/O

HDFS源码分析之NameNode(2)————Format

在Hadoop的HDFS部署好了之后并不能马上使用,而是先要对配置的文件系统进行格式化.在这里要注意两个概念,一个是文件系统,此时的文件系统在物理上还不存在,或许是网络磁盘来描述会更加合适:二就是格式化,此处的格式化并不是指传统意义上的本地磁盘格式化,而是一些清除与准备工作.本文接下来将主要讨论NameNode节点上的格式化. 我们都知道,NameNode主要被用来管理整个分布式文件系统的命名空间(实际上就是目录和文件)的元数据信息,同时为了保证数据的可靠性,还加入了操作日志,所以,NameNo

Hadoop启动时报错:Incorrect configuration: namenode address dfs.namenode.servicerpc-address or...

Hadoop之前都是好好的,今天早上启动突然报错: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured. 在网上一顿乱找,各种方法一阵乱试,都没有解决问题.这时,突然想到,问题可能是昨天正确启动到今天启动错误之间做的某些更改造成的,而最可能的原因可能就是昨天安装Mahout时修改了环境变量.于是我就把.prof

卸载驱动出现:rmmod: can&#39;t change directory to &#39;/lib/modules&#39;: No such file or directory

rmmod: can't change directory to '/lib/modules': No such file or directory 新建目录/lib/modules #mkdir -p /lib/modules 又出现 rmmod: can't change directory to '2.6.32.2-FriendlyARM': No such file or directory 继续新建 #mkdir -p /lib/modules/2.6.32.2-FriendlyARM

[转载]rmmod: can&#39;t change directory to &#39;/lib/modules&#39;: No such file or directory

转载网址:http://blog.csdn.net/chengwen816/article/details/8781096 在我新移植的kernel(3.4.2)和yaffs2文件中,加载新编译的内核模块时,遇到如下问题(无法卸载模块): 1.rmmod: can't change directory to '/lib/modules': No such file or directory 此时应该在文件系统中创建/lib/modules目录,接着又有下面问题: 2. rmmod: can't

HDFS读写数据块--${dfs.data.dir}选择策略

最近工作需要,看了HDFS读写数据块这部分.不过可能跟网上大部分帖子不一样,本文主要写了${dfs.data.dir}的选择策略,也就是block在DataNode上的放置策略.我主要是从我们工作需要的角度来读这部分代码的. 1 hdfs-site.xml 2 <property> 3 <name>dfs.data.dir</name> 4 <value>/mnt/datadir1/data,/mnt/datadir2/data,/mnt/datadir3/

Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.

在搭建Hadoop集群的时候,遇到了这样的一个报错. 就在启动HDFS调用命令: start-dfs.sh 的时候,报错: 然后输密码就一直很奇怪,反正一直运行不成功. 百度了半天,确定是core-site.xml的问题. 这段代码决定了什么namenode 的rpcaddress: <property> <name>fs.defaultFS</name> <value> hdfs://hdp-node-01:9000 </value> <

java.io.FileNotFoundException: /home/hadoop/hadoop/dfs/namenode/current/VERSION (Permission denied)

今天布置hadoop集群,尝试单独将secondarynamenode分属到一台独立的虚拟机上, 当格式化后,start-dfs.sh.namenode没启动.查看日志.报错例如以下 查看权限才发现,/current/VERSION是隶属root的.须要更改用户. 之后.再start-dfs.sh,依旧提示no [master](网上有人说是hosts下的hostname的问题,可是我核对多次,没有问题), 可是namenode进程确实起开了. 版权声明:本文博客原创文章,博客,未经同意,不得转