今天工作上遇到一个问题,报错如下:
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_96889_m_000004_0_1149914572_1 on client 132.121.94.29, because this file is already being created by NN_Recovery on 132.121.94.29
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1826)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1649)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:712)
at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:691)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy7.create(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy7.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3660)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:930)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:207)
查NameNode的日志:
2016-02-15 00:02:28,392 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: NN_Recovery, pendingcreates: 182], src=/apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP from client NN_Recovery
2016-02-15 00:02:28,392 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: NN_Recovery, pendingcreates: 182], src=/apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP 2016-02-15 00:02:28,392 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* blk_-3441330245353537374_1007295680 recovery started, primary=132.121.94.29:1010
2016-02-15 00:02:28,407 WARN org.apache.hadoop.hdfs.StateChange: DIR* startFile: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 2016-02-15 00:02:28,407 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:icps/[email protected] cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 2016-02-15 00:02:28,407 INFO org.apache.hadoop.ipc.Server: IPC Server handler 34 on 8020, call create(/apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP, rwxr-xr-x, DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1, true, 3, 67108864) from 132.121.94.7:48867: error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /apps/icps/data/collect/tmp/10100345.10011301..IVPN_F.20160214.5834.ICPS.COLLECT.201602000000.0000.NORMAL.TMP for DFSClient_attempt_201601231122_95299_m_000003_0_-559831283_1 on client 132.121.94.7, because this file is already being created by NN_Recovery on 132.121.94.29 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1826) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1649) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1595) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:712) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:691)
一直反复刷这三条日志,
而我的代码只是简单的在HDFS上创建一个文件:
FSDataOutputStream fsdout = fs.create(tmpPath);
默认是如果该文件存在则会覆盖掉,一开始很奇怪,我感觉这是hadoop本身的bug
于是上hadoop官网上查找这个异常,发现了这个:
Creating an already-open-for-write file with overwrite=true fails
Details
- Type: Bug
- Status:CLOSED
- Priority: Major
- Resolution:Fixed
- Affects Version/s:2.0.0-alpha
- Fix Version/s:2.0.2-alpha
- Component/s:namenode
- Labels:
None
- Hadoop Flags:
Reviewed
- Release Note:
This is an incompatible change: Before this change, if a file is already open for write by one client, and another client calls fs.create() with overwrite=true, an AlreadyBeingCreatedException is thrown. After this change, the file will be deleted and the new file will be created successfully.
Description
If a file is already open for write by one client, and another client calls fs.create() with overwrite=true, the file should be deleted and the new file successfully created. Instead, it is currently throwing AlreadyBeingCreatedException.
This is a regression since branch-1.
(详情请看:https://issues.apache.org/jira/browse/HDFS-3755)
查看这个BUG的修改日志:
--- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java (revision 1370568) +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java (working copy) @@ -1755,8 +1755,6 @@ try { INodeFile myFile = dir.getFileINode(src); - recoverLeaseInternal(myFile, src, holder, clientMachine, false); - try { blockManager.verifyReplication(src, replication, clientMachine); } catch(IOException e) { recoverLeaseInternal(myFile, src, holder, clientMachine, false);这个函数被删掉了,取而代之的是blockManager.verifyReplication(src, replication, clientMachine); 我反编译我们所使用的依赖库代码,发现我们的代码还是使用这个函数: ......
try {
INode myFile = dir.getFileINode(src);
recoverLeaseInternal(myFile, src, holder, clientMachine, false);
try {
verifyReplication(src, replication, clientMachine);
......结论: 试图创建一个已存在的文件有可能会导致抛出AlreadyBeingCreatedException这个异常,由于namenode的server一直刷回收lease的日志,有可能是lease回收失败导致. 解决办法: 将代码: FSDataOutputStream fsdout = fs.create(tmpPath); 改成: if (fs.exists(tmpPath)) { fs.delete(tmpPath, false); } FSDataOutputStream fsdout = fs.create(tmpPath);