配置CDH和管理服务
关闭DataNode前HDFS的调优
角色要求:配置员、集群管理员、完全管理员
当一个DataNode关闭,NameNode确保每一个DataNode中的每一个块根据复制系数(the replication factor)跨集群仍然是可用的。这个过程涉及到DataNode间小批次的块复制。在这种情况下,一个DataNode有成千上万个块,关闭后集群间还原备份数可能需要几个小时。关闭DataNode的主机之前,你应该首先调整HDFS:
1、提高DataNode的堆栈大小。DataNode应该至少有4 GB的堆栈大小,以允许迭代的增加和最大的流
a、去HDFS服务页面;
b、单击配置(Configuration)选项卡;
c、在每个DataNode角色组(DataNode默认组和额外的DataNode角色组)去资源管理(ResourceManagement)类别,并设置DataNode的Java堆栈大小(字节)(Java
Heap Size of DataNode in Bytes);
d、点击保存更改(Save Changes)提交更改。
2、设置DataNode平衡带宽
a、展开DataNode默认组(DataNode Default Group) >
性能(Performance)类别;
b、根据你的磁盘和网络性能配置DataNode平衡带宽(DataNode Balancing Bandwidth);
c、点击保存更改(Save Changes)提交更改。
3、提高依据迭代设置复制工作乘数器的数值(默认值是2,然而推荐值是10)
a、展开NameNode默认组(NameNode Default Group)
>高级(Advanced)类别;
b、将配置依据迭代设置复制工作乘数器(Replication Work Multiplier Per Iteration)设置为10;
c、点击保存更改(Save Changes)提交更改。
4、增加复制的最大线程数和最大复制线程的限制数
a、展开NameNode默认组(NameNode Default Group)
>高级(Advanced)类别;
b、配置Datanode复制线程的最大数量(Maximumnumber of replication threads on a Datanode)和Datanod复制线程的数量的限制数(Hardlimit
on the number of replication threads on a Datanod)分别为50和100;
c、点击保存更改(Save Changes)提交更改。
5、重新启动HDFS服务。
翻译水平有限,以下是手打英文原文:
Configuring CDH and Managed Services
Tuning HDFS Prior to Decommissioning DataNodes
Required Role: Configurator、 Cluster Administrator、 Full Administrator
When a DataNode isdecommissioned, the NameNode ensures that every that every block from the DataNodewill still be available across the cluster as dictated by the replicationfactor. This procedure involves copying blocks off the DataNode in smallbatches.
In cases where a DataNode has thousands of blocks,decommissioning cantake several hours. Before decommissioning hosts with DataNodes,you shouldfirst tune HDFS:
1、Raise the heap size of the DataNodes.DataNodes should be configured with at least 4 GB heap size to allow for theincrease in iterations and max streams.
a、Go to the HDFS service page.
b、Click the Configuration tab.
c、Under each DataNode role group (DataNodeDefault Group and additional DataNode role groups) go to the
Resource Management category, and setthe Java Heap Size of DataNode in Bytesproperty as recommended.
d、Click SaveChanges to commit the changes.
2、Set the DataNode balancing bandwith:
a、Expand the DataNode Default Group > Performancecategory.
b、Configure the DataNode Balancing Bandwidth property to the bandwisth you have onyour disks and network.
c、Click SaveChanges to commit the changes.
3、Increase the replication work multiplierper iteration to a larger number (the default is 2, however 10 is recommended):
a、Expand the NameNodeDefault Group > Advancedcatrgory.
b、Configure the ReplicationWork Multiplier Per Iteration property to a value such as 10.
c、Click SaveChanges to commit the changes.
4、 Increase the replication maximim threadsand maximum replication thread hard limits:
a、 Expand the NameNodeDefault Group > Advancedcategory.
b、 Configure the Maximum number of replication threads on a Datanode and
Hard limit on the number of replicationthreads on a Datanode properties to 50 and 100 respectively.
c、 Click SaveChanges to commit the Changes.
5、Restart the HDFS service.