Hadoop0.21内存泄漏问题：数据块映射管理的一个bug

我们的HDFS生产环境是Hadoop-0.21,机器规模200台，block在7KW左右. 集群每运行几个月，NameNode就会频繁FGC,最后不得不restart NameNode. 因此怀疑NameNode存在内存泄漏问题，我们dump出了NameNode进程在重启前后的对象统计信息。

07-10重启前:

num #instances #bytes class name

----------------------------------------------

1: 59262275 3613989480 [Ljava.lang.Object;

...

10: 8549361 615553992 org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction

11: 5941511 427788792 org.apache.hadoop.hdfs.server.namenode.INodeFileUnderConstruction

...

07-10重启后:

num #instances #bytes class name

----------------------------------------------

1: 44188391 2934099616 [Ljava.lang.Object;

...

23: 721763 51966936 org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction

24: 620028 44642016 org.apache.hadoop.hdfs.server.namenode.INodeFileUnderConstruction

...

从上面的信息可以看出，NameNode节点重启前最占内存的对象是[Ljava.lang.Object、[C、org.apache.hadoop.hdfs.server.namenode.INodeFile、org.apache.hadoop.hdfs.server.namenode.BlockInfo、[B、org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction$ReplicaUnderConstruction等，它们的引用关系如下:

其中，根据NameNode节点的内部处理逻辑，INodeFileUnderConstruction和BlockInfoUnderConstruction都属于中间状态，当文件的写关闭之后，INodeFileUnderConstruction会变成INodeFile，BlockInfoUnderConstruction会变成BlockInfo，而集群的文件写压力不可能在100W/s级别，因此，NameNode节点可能存在内存泄漏。

文件在close时会调用NameNode的complete方法来关闭，此时BlocksMap的映射就会从BlockInfoUnderConstruction-->BlockInfoUnderConstruction 变成BlockInfo-~~>BlockInfo. (我们暂且描述为oldBlock>oldBlock 替换为newBlock~~->newBlock) BlocksMap对这一状态转变的处理逻辑是:

 BlockInfo replaceBlock(BlockInfo newBlock) {
    BlockInfo currentBlock = map.get(newBlock);
 
    assert currentBlock != null : "the block if not in blocksMap";
    // replace block in data-node lists
    for(int idx = currentBlock.numNodes()-1; idx >= 0; idx--) {
      DatanodeDescriptor dn = currentBlock.getDatanode(idx);
      Log.info("Replace Block[" + newBlock + "] to Block[" + currentBlock + "] in DataNode[" + dn + "]");
      dn.replaceBlock(currentBlock, newBlock);
    }
    
    // replace block in the map itself
    map.put(newBlock, newBlock);
    return newBlock;
  }

Block重写了hashCode和equals方法，使得newBlock和oldBlock有相同的hashCode,而且newBlock.equals(oldBlock)=true.

上述代码的原意是将map中的Entry(oldBlock,oldBlock)替换成(newBlock,newBlock). 而HashMap在处理put的时候，如果key相同(注意：这里的相同是指(newKey.hashCode==oldKey.hashCode && (oldKey==newKey || oldkey.equals(newKey))). 只会将对应的value替换，导致oldBlock->oldBlock被替换成oldBlock->newBlock，也就是oldBlock依然没有被释放，也就是所谓的内存泄漏。

请参考HashMap的代码:

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

建议将BlocksMap修复如下，已经提交patch，请见:https://issues.apache.org/jira/browse/HDFS-7592

 BlockInfo replaceBlock(BlockInfo newBlock) {

+   /**
+    * change to fix bug about memory leak of NameNode by huahua.xu  
+    * 2013-08-17 15:20
+    */
 
    BlockInfo currentBlock = map.get(newBlock);
 
    assert currentBlock != null : "the block if not in blocksMap";
    // replace block in data-node lists
    for(int idx = currentBlock.numNodes()-1; idx >= 0; idx--) {
      DatanodeDescriptor dn = currentBlock.getDatanode(idx);
      dn.replaceBlock(currentBlock, newBlock);
    }
    
    // replace block in the map itself

+   BlockInfo currentBlock = map.remove(newBlock);     
    map.put(newBlock, newBlock);
    return newBlock;
  }

截止到目前为止，该patch已经正式更新到线上集群，解决了内存泄漏问题。

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

这是公司生产环境中遇到的较为严重的bug,目前已经提交社区：https://issues.apache.org/jira/browse/HDFS-7592 ，在此和大家share一下。如有问题可邮件联系QQ576072986.

时间： 2024-08-30 01:40:48

Hadoop0.21内存泄漏问题：数据块映射管理的一个bug

Hadoop0.21内存泄漏问题：数据块映射管理的一个bug的相关文章

NameNode对数据块的管理

（转）从内存管理、内存泄漏、内存回收探讨C++内存管理

Android内存泄漏查找和解决

HDFS源码分析(三)-----数据块关系基本结构

一个跨平台的 C++ 内存泄漏检测器

内存泄漏分析工具tMemMonitor (TMM)使用简介

内存泄漏与内存溢出是什么？

使用BBED理解和修改Oracle数据块

浅析c#内存泄漏