在client向DataNode写入block之前,会与NameNode有一次通信,由NameNode来选择指定数目的DataNode来存放副本。具体的副本选择策略在BlockPlacementPolicy接口中,其子类实现是BlockPlacementPolicyDefault。该类中会有多个chooseTarget()方法重载,但最终调用了下面的方法:
1 /** 2 * This is not part of the public API but is used by the unit tests. 3 */ 4 DatanodeDescriptor[] chooseTarget(int numOfReplicas, 5 DatanodeDescriptor writer, 6 List<DatanodeDescriptor> chosenNodes, 7 HashMap<Node, Node> excludedNodes, 8 long blocksize) { 9 //numOfReplicas:要选择的副本个数 10 //clusterMap.getNumOfLeaves():整个集群的DN个数 11 if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) { 12 return new DatanodeDescriptor[0]; 13 } 14 15 //excludedNodes:排除的DN(因为有些DN已经被选中,所以不再选择他们) 16 if (excludedNodes == null) { 17 excludedNodes = new HashMap<Node, Node>(); 18 } 19 20 int clusterSize = clusterMap.getNumOfLeaves(); 21 //总的副本个数=已选择的个数 + 指定的副本个数 22 int totalNumOfReplicas = chosenNodes.size()+numOfReplicas; 23 if (totalNumOfReplicas > clusterSize) { //若总副本个数 > 整个集群的DN个数 24 numOfReplicas -= (totalNumOfReplicas-clusterSize); 25 totalNumOfReplicas = clusterSize; 26 } 27 28 //计算每个一个rack能有多少个DN被选中 29 int maxNodesPerRack = 30 (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2; 31 32 List<DatanodeDescriptor> results = 33 new ArrayList<DatanodeDescriptor>(chosenNodes); 34 for (DatanodeDescriptor node:chosenNodes) { 35 // add localMachine and related nodes to excludedNodes 36 addToExcludedNodes(node, excludedNodes); 37 adjustExcludedNodes(excludedNodes, node); 38 } 39 40 //客户端不是DN 41 if (!clusterMap.contains(writer)) { 42 writer=null; 43 } 44 45 boolean avoidStaleNodes = (stats != null && stats 46 .shouldAvoidStaleDataNodesForWrite()); 47 48 //选择numOfReplicas个DN,并返回本地DN 49 DatanodeDescriptor localNode = chooseTarget(numOfReplicas, writer, 50 excludedNodes, blocksize, maxNodesPerRack, results, avoidStaleNodes); 51 52 results.removeAll(chosenNodes); 53 54 // sorting nodes to form a pipeline 55 //将选中的DN(result中的元素)组织成pipe 56 return getPipeline((writer==null)?localNode:writer, 57 results.toArray(new DatanodeDescriptor[results.size()])); 58 }
方法含义大概就如注释中写的,不过要注意其中的变量含义。在第48行,又调用chooseTarget()方法来选择指定数目的DN(选中的DN存放在result中),并返回一个DN作为本地DN。下面分析这个方法。
时间: 2024-10-06 07:28:52