HDFS中datanode节点block损坏后的自动恢复过程

相关参数说明

dfs.blockreport.intervalMsec :datanode向namenode报告块信息的时间间隔,默认6小时

datanode日志中记录如下:

dfs.datanode.directoryscan.interval:datanode进行内存和磁盘数据集块校验,更新内存中的信息和磁盘中信息的不一致情况,默认6小时

datanode日志中记录如下:

测试机器:

10.0.50.144  master  (namenode,datanode)

10.0.50.145  node1    (datanode)

10.0.50.146  node2    (datanode)

参数配置

hdfs-site.xml中的两个主要参数配置入下

<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>master:9001</value>
</property>
<property>
  <name>dfs.blockreport.intervalMsec</name>
    <value>600000</value>
      <description>Determines block reporting interval in milliseconds.</description>
</property>
<property>
  <name>dfs.datanode.directoryscan.interval</name>
    <value>600</value>
</property>

都是10分钟

测试过程

模拟在node1上破坏一个块后,是否能自动恢复

[[email protected] subdir0]$ ll
total 609700
-rw-rw-r-- 1 hbase hbase        13 Sep 10 11:32 blk_1073741825
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741825_1001.meta
-rw-rw-r-- 1 hbase hbase        12 Sep 10 11:32 blk_1073741826
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741826_1002.meta
-rw-rw-r-- 1 hbase hbase         9 Sep 10 11:32 blk_1073741827
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741827_1003.meta
-rw-rw-r-- 1 hbase hbase        30 Sep 10 11:33 blk_1073741834
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:33 blk_1073741834_1010.meta
-rw-rw-r-- 1 hbase hbase       349 Sep 10 11:33 blk_1073741835
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:33 blk_1073741835_1011.meta
-rw-rw-r-- 1 hbase hbase     46744 Sep 10 11:33 blk_1073741836
-rw-rw-r-- 1 hbase hbase       375 Sep 10 11:33 blk_1073741836_1012.meta
-rw-rw-r-- 1 hbase hbase    113741 Sep 10 11:33 blk_1073741837
-rw-rw-r-- 1 hbase hbase       899 Sep 10 11:33 blk_1073741837_1013.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 10 17:49 blk_1073741838
-rw-rw-r-- 1 hbase hbase   1048583 Sep 10 17:49 blk_1073741838_1014.meta
-rw-rw-r-- 1 hbase hbase  19295151 Sep 10 17:49 blk_1073741839
-rw-rw-r-- 1 hbase hbase    150751 Sep 10 17:49 blk_1073741839_1015.meta
-rw-rw-r-- 1 hbase hbase 153512879 Sep 11 11:12 blk_1073741846
-rw-rw-r-- 1 hbase hbase   1199327 Sep 11 11:12 blk_1073741846_1022.meta
-rw-rw-r-- 1 hbase hbase        22 Sep 11 11:12 blk_1073741848
-rw-rw-r-- 1 hbase hbase        11 Sep 11 11:12 blk_1073741848_1024.meta
-rw-rw-r-- 1 hbase hbase       155 Sep 17 22:31 blk_1073741849
-rw-rw-r-- 1 hbase hbase        11 Sep 17 22:31 blk_1073741849_1025.meta
-rw-rw-r-- 1 hbase hbase       363 Sep 11 11:12 blk_1073741850
-rw-rw-r-- 1 hbase hbase        11 Sep 11 11:12 blk_1073741850_1026.meta
-rw-rw-r-- 1 hbase hbase     33430 Sep 11 11:12 blk_1073741851
-rw-rw-r-- 1 hbase hbase       271 Sep 11 11:12 blk_1073741851_1027.meta
-rw-rw-r-- 1 hbase hbase    115097 Sep 11 11:12 blk_1073741852
-rw-rw-r-- 1 hbase hbase       907 Sep 11 11:12 blk_1073741852_1028.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 17 14:44 blk_1073741853
-rw-rw-r-- 1 hbase hbase   1048583 Sep 17 14:44 blk_1073741853_1029.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 18 14:55 blk_1073741854
-rw-rw-r-- 1 hbase hbase   1048583 Sep 18 14:55 blk_1073741854_1030.meta
-rw-rw-r-- 1 hbase hbase  43608288 Sep 17 14:44 blk_1073741855
-rw-rw-r-- 1 hbase hbase    340699 Sep 17 14:44 blk_1073741855_1031.meta

执行  mv blk_1073741853* /tmp

[[email protected] subdir0]$ mv blk_1073741853* /tmp
[[email protected] subdir0]$ ll
total 477600
-rw-rw-r-- 1 hbase hbase        13 Sep 10 11:32 blk_1073741825
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741825_1001.meta
-rw-rw-r-- 1 hbase hbase        12 Sep 10 11:32 blk_1073741826
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741826_1002.meta
-rw-rw-r-- 1 hbase hbase         9 Sep 10 11:32 blk_1073741827
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741827_1003.meta
-rw-rw-r-- 1 hbase hbase        30 Sep 10 11:33 blk_1073741834
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:33 blk_1073741834_1010.meta
-rw-rw-r-- 1 hbase hbase       349 Sep 10 11:33 blk_1073741835
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:33 blk_1073741835_1011.meta
-rw-rw-r-- 1 hbase hbase     46744 Sep 10 11:33 blk_1073741836
-rw-rw-r-- 1 hbase hbase       375 Sep 10 11:33 blk_1073741836_1012.meta
-rw-rw-r-- 1 hbase hbase    113741 Sep 10 11:33 blk_1073741837
-rw-rw-r-- 1 hbase hbase       899 Sep 10 11:33 blk_1073741837_1013.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 10 17:49 blk_1073741838
-rw-rw-r-- 1 hbase hbase   1048583 Sep 10 17:49 blk_1073741838_1014.meta
-rw-rw-r-- 1 hbase hbase  19295151 Sep 10 17:49 blk_1073741839
-rw-rw-r-- 1 hbase hbase    150751 Sep 10 17:49 blk_1073741839_1015.meta
-rw-rw-r-- 1 hbase hbase 153512879 Sep 11 11:12 blk_1073741846
-rw-rw-r-- 1 hbase hbase   1199327 Sep 11 11:12 blk_1073741846_1022.meta
-rw-rw-r-- 1 hbase hbase        22 Sep 11 11:12 blk_1073741848
-rw-rw-r-- 1 hbase hbase        11 Sep 11 11:12 blk_1073741848_1024.meta
-rw-rw-r-- 1 hbase hbase       155 Sep 17 22:31 blk_1073741849
-rw-rw-r-- 1 hbase hbase        11 Sep 17 22:31 blk_1073741849_1025.meta
-rw-rw-r-- 1 hbase hbase       363 Sep 11 11:12 blk_1073741850
-rw-rw-r-- 1 hbase hbase        11 Sep 11 11:12 blk_1073741850_1026.meta
-rw-rw-r-- 1 hbase hbase     33430 Sep 11 11:12 blk_1073741851
-rw-rw-r-- 1 hbase hbase       271 Sep 11 11:12 blk_1073741851_1027.meta
-rw-rw-r-- 1 hbase hbase    115097 Sep 11 11:12 blk_1073741852
-rw-rw-r-- 1 hbase hbase       907 Sep 11 11:12 blk_1073741852_1028.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 18 14:55 blk_1073741854
-rw-rw-r-- 1 hbase hbase   1048583 Sep 18 14:55 blk_1073741854_1030.meta
-rw-rw-r-- 1 hbase hbase  43608288 Sep 17 14:44 blk_1073741855
-rw-rw-r-- 1 hbase hbase    340699 Sep 17 14:44 blk_1073741855_1031.meta

执行好后马上执行fsck 还是显示healthy状态,复制个数还是3(因为datonode节点还没有检测内存和磁盘上的数据块状态)

[[email protected] sbin]$ hadoop fsck /tmp -files -blocks -racks|grep 1073741853
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://master:50070/fsck?ugi=hbase&files=1&blocks=1&racks=1&path=%2Ftmp
0. BP-1578427263-10.0.52.144-1441855472637:blk_1073741853_1029 len=134217728 repl=3 [/default-rack/10.0.52.146:50010, /default-rack/10.0.52.145:50010, /default-rack/10.0.52.144:50010]

接着观察node1节点的datanode日志输出(大概是在07:36的时候执行的mv操作,是在35分DirectoryScanner之后执行的)

2015-09-18 07:20:54,857 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x1354f1f2f5693e,  containing 1 storage report(s), of which we sent 1. The reports had 18 total blocks and used 1 RPC(s). This took 0 msec to generate and 5 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
2015-09-18 07:20:54,857 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-1578427263-10.0.52.144-1441855472637
2015-09-18 07:25:57,936 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1578427263-10.0.52.144-1441855472637 Total blocks: 18, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
2015-09-18 07:30:54,857 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x13557da5bf9236,  containing 1 storage report(s), of which we sent 1. The reports had 18 total blocks and used 1 RPC(s). This took 0 msec to generate and 5 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
2015-09-18 07:30:54,857 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-1578427263-10.0.52.144-1441855472637
2015-09-18 07:35:57,896 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1578427263-10.0.52.144-1441855472637 Total blocks: 18, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
2015-09-18 07:40:54,856 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x135609588a3066,  containing 1 storage report(s), of which we sent 1. The reports had 18 total blocks and used 1 RPC(s). This took 0 msec to generate and 4 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
2015-09-18 07:40:54,857 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-1578427263-10.0.52.144-1441855472637
2015-09-18 07:45:57,895 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1578427263-10.0.52.144-1441855472637 Total blocks: 17, missing metadata files:1, missing block files:1, missing blocks in memory:0, mismatched blocks:0
2015-09-18 07:45:57,895 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed block 1073741853 from memory with missing block file on the disk
2015-09-18 07:50:54,858 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x1356950b4e3e17,  containing 1 storage report(s), of which we sent 1. The reports had 17 total blocks and used 1 RPC(s). This took 0 msec to generate and 6 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
2015-09-18 07:50:54,858 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-1578427263-10.0.52.144-1441855472637
2015-09-18 07:50:58,035 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1578427263-10.0.52.144-1441855472637:blk_1073741853_1029 src: /10.0.52.146:56860 dest: /10.0.52.145:50010
2015-09-18 07:50:59,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-1578427263-10.0.52.144-1441855472637:blk_1073741853_1029 src: /10.0.52.146:56860 dest: /10.0.52.145:50010 of size 134217728
2015-09-18 07:55:57,892 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1578427263-10.0.52.144-1441855472637 Total blocks: 18, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
2015-09-18 08:00:54,856 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x135720be18e7ba,  containing 1 storage report(s), of which we sent 1. The reports had 18 total blocks and used 1 RPC(s). This took 0 msec to generate and 4 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
2015-09-18 08:00:54,856 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-1578427263-10.0.52.144-1441855472637

查看node1上的文件情况,发现已经恢复

[[email protected] subdir0]$ ll
total 609700
-rw-rw-r-- 1 hbase hbase        13 Sep 10 11:32 blk_1073741825
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741825_1001.meta
-rw-rw-r-- 1 hbase hbase        12 Sep 10 11:32 blk_1073741826
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741826_1002.meta
-rw-rw-r-- 1 hbase hbase         9 Sep 10 11:32 blk_1073741827
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:32 blk_1073741827_1003.meta
-rw-rw-r-- 1 hbase hbase        30 Sep 10 11:33 blk_1073741834
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:33 blk_1073741834_1010.meta
-rw-rw-r-- 1 hbase hbase       349 Sep 10 11:33 blk_1073741835
-rw-rw-r-- 1 hbase hbase        11 Sep 10 11:33 blk_1073741835_1011.meta
-rw-rw-r-- 1 hbase hbase     46744 Sep 10 11:33 blk_1073741836
-rw-rw-r-- 1 hbase hbase       375 Sep 10 11:33 blk_1073741836_1012.meta
-rw-rw-r-- 1 hbase hbase    113741 Sep 10 11:33 blk_1073741837
-rw-rw-r-- 1 hbase hbase       899 Sep 10 11:33 blk_1073741837_1013.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 10 17:49 blk_1073741838
-rw-rw-r-- 1 hbase hbase   1048583 Sep 10 17:49 blk_1073741838_1014.meta
-rw-rw-r-- 1 hbase hbase  19295151 Sep 10 17:49 blk_1073741839
-rw-rw-r-- 1 hbase hbase    150751 Sep 10 17:49 blk_1073741839_1015.meta
-rw-rw-r-- 1 hbase hbase 153512879 Sep 11 11:12 blk_1073741846
-rw-rw-r-- 1 hbase hbase   1199327 Sep 11 11:12 blk_1073741846_1022.meta
-rw-rw-r-- 1 hbase hbase        22 Sep 11 11:12 blk_1073741848
-rw-rw-r-- 1 hbase hbase        11 Sep 11 11:12 blk_1073741848_1024.meta
-rw-rw-r-- 1 hbase hbase       155 Sep 17 22:31 blk_1073741849
-rw-rw-r-- 1 hbase hbase        11 Sep 17 22:31 blk_1073741849_1025.meta
-rw-rw-r-- 1 hbase hbase       363 Sep 11 11:12 blk_1073741850
-rw-rw-r-- 1 hbase hbase        11 Sep 11 11:12 blk_1073741850_1026.meta
-rw-rw-r-- 1 hbase hbase     33430 Sep 11 11:12 blk_1073741851
-rw-rw-r-- 1 hbase hbase       271 Sep 11 11:12 blk_1073741851_1027.meta
-rw-rw-r-- 1 hbase hbase    115097 Sep 11 11:12 blk_1073741852
-rw-rw-r-- 1 hbase hbase       907 Sep 11 11:12 blk_1073741852_1028.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 18 15:50 blk_1073741853
-rw-rw-r-- 1 hbase hbase   1048583 Sep 18 15:50 blk_1073741853_1029.meta
-rw-rw-r-- 1 hbase hbase 134217728 Sep 18 14:55 blk_1073741854
-rw-rw-r-- 1 hbase hbase   1048583 Sep 18 14:55 blk_1073741854_1030.meta
-rw-rw-r-- 1 hbase hbase  43608288 Sep 17 14:44 blk_1073741855
-rw-rw-r-- 1 hbase hbase    340699 Sep 17 14:44 blk_1073741855_1031.meta

node1(10.0.52.145)是在07:45:57的时候,做的directoryscan,然后发现磁盘少了一个数据块,于是删掉了内存中这个数据块,在07:50:54的时候,向namenode报告数据块信息

所以也可以查看master上namenode日志输出,发现namenode 要求10.0.52.146 向10.0.52.145复制缺失的数据块

2015-09-18 07:50:54,881 INFO BlockStateChange: BLOCK* processReport: from storage DS-47f165f8-0a5f-4d73-bb2b-3a05daa72fef node DatanodeRegistration(10.0.52.145:50010, datanodeUuid=739dc2ca-08b6-4e74-b6dc-f1ac9b0fb337, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-48284406-dfcb-472d-882c-0b7afe4bddfb;nsid=1714235198;c=0), blocks: 17, hasStaleStorage: false, processing time: 3 msecs
2015-09-18 07:50:56,185 INFO BlockStateChange: BLOCK* ask 10.0.52.146:50010 to replicate blk_1073741853_1029 to datanode(s) 10.0.52.145:50010
2015-09-18 07:50:59,202 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.0.52.145:50010 is added to blk_1073741853_1029 size 134217728

查看node2(10.0.52.146)datanode日志如下

2015-09-18 07:50:57,876 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.0.52.146:50010, datanodeUuid=d22102fa-a651-4daa-9387-e0b6264ff934, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-48284406-dfcb-472d-882c-0b7afe4bddfb;nsid=1714235198;c=0) Starting thread to transfer BP-1578427263-10.0.52.144-1441855472637:blk_1073741853_1029 to 10.0.52.145:50010
2015-09-18 07:50:59,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer: Transmitted BP-1578427263-10.0.52.144-1441855472637:blk_1073741853_1029 (numBytes=134217728) to /10.0.52.145:50010

初步总结如下:

单纯的模拟了其中一个数据块损坏的情况,数据块损坏后,在该节点执行directoryscan之前(dfs.datanode.directoryscan.interval决定),都不会发现损坏,在向namenode报告数据块信息之前(dfs.blockreport.intervalMsec决定),都不会恢复数据块,当namenode收到块信息后才会采取恢复措施

真实的情况肯定会更复杂,可以从这个简单的过程中了解开头所说的两个参数。

时间: 2024-10-28 14:29:02

HDFS中datanode节点block损坏后的自动恢复过程的相关文章

HDFS中DataNode工作机制

1.DataNode工作机制 1)一个数据块在datanode上以文件形式存储在磁盘上,包括两个文件,一个是数据本身,一个是元数据(包括数据块的长度,块数据的校验和,以及时间戳). 2)DataNode启动后向namenode注册,通过后,周期性(1小时)的向namenode上报所有的块信息. 3)心跳是每3秒一次,心跳返回结果带有namenode给该datanode的命令如复制块数据到另一台机器,或删除某个数据块.如果超过10分钟没有收到某个datanode的心跳,则认为该节点不可用. 4)集

HDFS中DataNode的心跳机制

我们在微职位课程DataNode心跳机制的作用讲解了DataNode的三个作用: register:当DataNode启动的时候,DataNode需要将自身的一些信息(hostname, version等)告诉NameNode,NameNode经过check后使其成为集群中的一员,然后信息维护在NetworkTopology中 block report:将block的信息汇报给NameNode,使得NameNode可以维护数据块和数据节点之间的映射关系 定期的send heartbeat 告诉N

raid-6磁盘阵列损坏导致数据丢失的恢复过程(图文教程)

一.故障描述机房突然断电导致整个存储瘫痪,加电后存储依然无法使用.经过用户方工程师诊断后认为是断电导致存储阵列损坏.整个存储是由12块日立硬盘(3T SAS硬盘)组成的RAID-6磁盘阵列,被分成一个卷,分配给几台Vmware的ESXI主机做共享存储.整个卷中存放了大量的Windows虚拟机,虚拟机基本都是模板创建的,因此系统盘都统一为160G.数据盘大小不确定,并且数据盘都是精简模式. 二.备份数据将故障存储的所有磁盘和备份sss数据的目标磁盘连入到一台Windows Server 2008的

[办公自动化]计算机突然死机后asd自动恢复文档未能恢复,如何使用

今天计算机突然死机,但是word未能提示自动恢复窗格.所以无法自动恢复word文档.但是在文档所在的文件夹看到了一个“自动恢复”开头的asd恢复文档. 该如何使用这个文档呢? 安装以前的惯例,尝试了如下方法: 1)直接双击无法打开. 2)修改后缀为docx也无法使用. 最后经查阅word帮助文档,可以在word中,单击“打开”文档界面,直接选中打开自动恢复文档“asd”文件,另存为docx文档,就一切恢复正常了. [读书时间] 1.Excel Home出版的系列书籍 2.刘万祥<Excel图表之

HDFS中NameNode发生故障没有备份从SecondNameNode恢复

1.Secondary NameNode目录结构 Secondary NameNode用来监控HDFS状态的辅助后台程序,每隔一段时间获取HDFS元数据的快照. 在/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/current这个目录中查看SecondaryNameNode目录结构. edits_0000000000000000001-0000000000000000002 fsimage_0000000000000000002 fsima

在浏览器中输入 www.baidu.com 后执行的全部过程

现在假设如果我们在客户端(客户端)浏览器中输入http://www.baidu.com,而baidu.com为要访问的服务器(服务器),下面详细分析客户端为了访问服务器而执行的一系列关于协议的操作: 1.客户端浏览器通过DNS解析到www.baidu.com的IP地址220.181.27.48,通过这个IP地址找到客户端到服务器的路径.客户端浏览器发起一个HTTP会话到220.161.27.48,然后通过TCP进行封装数据包,输入到网络层. 2.在客户端的传输层,把HTTP会话请求分成报文段,添

HDFS中的NameNode和DataNode

HDFS集群中以Master-Slave模式运行,主要有两类节点:一个Namenode节点(即master)和多个Datanode节点.Namenode管理文件系统的Namespace.他维护着文件系统树以及文件树中所有的文件和文件夹的元数据. hdfs架构图: Namenode: Namenode管理文件系统的Namespace.它维护着文件系统树以及文件树中所有的文件和文件夹的元数据(Metadata).管理这些信息的文件有两个,分别是Namespace镜像文件(Namespace imag

HDFS集群中DataNode的上线与下线

在HDFS集群的运维过程中,肯定会遇到DataNode的新增和删除,即上线与下线.这篇文章就详细讲解下DataNode的上线和下线的过程. 背景 在我们的微职位视频课程中,我们已经安装了3个节点的HDFS集群,master机器上安装了NameNode和SecondaryNameNode角色,slave1和slave2两台机器上分别都安装了DataNode角色. 我们现在来给这个HDFS集群新增一个DataNode,这个DataNode是安装在master机器上 我们需要说明的是:在实际环境中,N

Datanode启动问题 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool &lt;registering&gt;

2017-04-15 21:21:15,423 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup 2017-04-15 21:21:15,467 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: