HDFS副本选择策略

　　在client向DataNode写入block之前，会与NameNode有一次通信，由NameNode来选择指定数目的DataNode来存放副本。具体的副本选择策略在BlockPlacementPolicy接口中，其子类实现是BlockPlacementPolicyDefault。该类中会有多个chooseTarget()方法重载，但最终调用了下面的方法：

 1 /**
 2    * This is not part of the public API but is used by the unit tests.
 3    */
 4   DatanodeDescriptor[] chooseTarget(int numOfReplicas,
 5                                     DatanodeDescriptor writer,
 6                                     List<DatanodeDescriptor> chosenNodes,
 7                                     HashMap<Node, Node> excludedNodes,
 8                                     long blocksize) {
 9       //numOfReplicas：要选择的副本个数
10       //clusterMap.getNumOfLeaves()：整个集群的DN个数
11     if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
12       return new DatanodeDescriptor[0];
13     }
14
15     //excludedNodes：排除的DN(因为有些DN已经被选中，所以不再选择他们)
16     if (excludedNodes == null) {
17       excludedNodes = new HashMap<Node, Node>();
18     }
19
20     int clusterSize = clusterMap.getNumOfLeaves();
21     //总的副本个数=已选择的个数 + 指定的副本个数
22     int totalNumOfReplicas = chosenNodes.size()+numOfReplicas;
23     if (totalNumOfReplicas > clusterSize) {    //若总副本个数 > 整个集群的DN个数
24       numOfReplicas -= (totalNumOfReplicas-clusterSize);
25       totalNumOfReplicas = clusterSize;
26     }
27
28     //计算每个一个rack能有多少个DN被选中
29     int maxNodesPerRack =
30       (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
31
32     List<DatanodeDescriptor> results =
33       new ArrayList<DatanodeDescriptor>(chosenNodes);
34     for (DatanodeDescriptor node:chosenNodes) {
35       // add localMachine and related nodes to excludedNodes
36       addToExcludedNodes(node, excludedNodes);
37       adjustExcludedNodes(excludedNodes, node);
38     }
39
40     //客户端不是DN
41     if (!clusterMap.contains(writer)) {
42       writer=null;
43     }
44
45     boolean avoidStaleNodes = (stats != null && stats
46         .shouldAvoidStaleDataNodesForWrite());
47
48     //选择numOfReplicas个DN，并返回本地DN
49     DatanodeDescriptor localNode = chooseTarget(numOfReplicas, writer,
50         excludedNodes, blocksize, maxNodesPerRack, results, avoidStaleNodes);
51
52     results.removeAll(chosenNodes);
53
54     // sorting nodes to form a pipeline
55     //将选中的DN(result中的元素)组织成pipe
56     return getPipeline((writer==null)?localNode:writer,
57                        results.toArray(new DatanodeDescriptor[results.size()]));
58   }

　　方法含义大概就如注释中写的，不过要注意其中的变量含义。在第48行，又调用chooseTarget()方法来选择指定数目的DN(选中的DN存放在result中)，并返回一个DN作为本地DN。下面分析这个方法。

时间： 2024-10-06 07:28:52

HDFS副本选择策略

HDFS副本选择策略的相关文章

HDFS副本放置策略

HDFS副本放置策略及机架感知

HDFS副本存放策略

DataNode引用计数磁盘选择策略

HDFS副本存放读取

HDFS读写数据块--${dfs.data.dir}选择策略

HDFS副本放置节点选择的优化

大数据：Hadoop（HDFS 的设计思路、设计目标、架构、副本机制、副本存放策略）

HDFS副本机制&负载均衡&机架感知&访问方式&健壮性&删除恢复机制&HDFS缺点