hdfs增加ns之后,重启DN报clusterId不匹配错误

在测试环境准备测试FastCopy,因为之前只有一个NS,准备增加一个NS也便于测试,一切都准备妥当之后,重启DN,但是DN死活连接不上新的NN,报以下错误:

java.io.IOException: Incompatible clusterIDs in /data0/hadoop/dfs/data: namenode clusterID = CID-79c6e55b-5897-4a30-b278-149827ac200f; datanode clusterID = CID-1561e550-a7b9-4886-8a9a-cc2328b82912
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:944)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:915)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
        at java.lang.Thread.run(Thread.java:745)</span>

错误提示DN的clusterID和NN的clusterID不匹配,同事提醒说,格式化新增的NN的时候指定DN也有的clusterID(CID-1561e550-a7b9-4886-8a9a-cc2328b82912)就可以了,一个NN节点上执行:

hdfs name -format -clusterid CID-1561e550-a7b9-4886-8a9a-cc2328b82912

根据提示格式化完NN和JN之后,启动该NN,新增的另外一个NN不需要格式化,只需要执行以下命令就能将之前启动的NN所有信息同步到自己目录下:

<span style="font-size:14px;">hdfs namenode -bootstrapStandby</span>

同步完成之后,启动NN,然后重启所有的DN,发现在NS1和NS2对应的NN上面都能看到所有的DN了。

以下来说一下什么是clusterID,也即clusterID的作用:

clusterID,也即是集群唯一的ID,其作用是确保可信任的DN连接到集群,DN中clusterID是DN第一次启动的时候从NN获取:

  private void connectToNNAndHandshake() throws IOException {
    // get NN proxy
    bpNamenode = dn.connectToNN(nnAddr);

    // First phase of the handshake with NN - get the namespace
    // info.
    NamespaceInfo nsInfo = retrieveNamespaceInfo();

    // Verify that this matches the other NN in this HA pair.
    // This also initializes our block pool in the DN if we are
    // the first NN connection for this BP.
    bpos.verifyAndSetNamespaceInfo(nsInfo);

    // Second phase of the handshake with the NN.
    register();
  }
NamespaceInfo retrieveNamespaceInfo() throws IOException {
    NamespaceInfo nsInfo = null;
    while (shouldRun()) {
      try {
        nsInfo = bpNamenode.versionRequest();
        LOG.debug(this + " received versionRequest response: " + nsInfo);
        break;
      } catch(SocketTimeoutException e) {  // namenode is busy
        LOG.warn("Problem connecting to server: " + nnAddr);
      } catch(IOException e ) {  // namenode is not available
        LOG.warn("Problem connecting to server: " + nnAddr);
      }

      // try again in a second
      sleepAndLogInterrupts(5000, "requesting version info from NN");
    }

    if (nsInfo != null) {
      checkNNVersion(nsInfo);
    } else {
      throw new IOException("DN shut down before block pool connected");
    }
    return nsInfo;
  }
void initBlockPool(BPOfferService bpos) throws IOException {
    NamespaceInfo nsInfo = bpos.getNamespaceInfo();
    if (nsInfo == null) {
      throw new IOException("NamespaceInfo not found: Block pool " + bpos
          + " should have retrieved namespace info before initBlockPool.");
    }

    // Register the new block pool with the BP manager.
    blockPoolManager.addBlockPool(bpos);

    setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());

    // In the case that this is the first block pool to connect, initialize
    // the dataset, block scanners, etc.
    initStorage(nsInfo);
    initPeriodicScanners(conf);

    data.addBlockPool(nsInfo.getBlockPoolID(), conf);
  }

并持久化到本地每一个存储目录下的VERSION文件中的:

cat /data0/hadoop/dfs/data/current/VERSION

#Thu Oct 23 14:06:21 CST 2014
storageID=DS-35e3967e-51e4-4a6c-a3da-d2be044c8522
clusterID=CID-1561e550-a7b9-4886-8a9a-cc2328b82912
cTime=0
datanodeUuid=1327c11f-984c-4c07-a44a-70ba5e84621c
storageType=DATA_NODE
layoutVersion=-55

所以如果HDFS在也有NS的基础上再增加NS,新的NN在格式化的时候必须指定之前也有的clusterID,这样DN才能成功连接上新的DN。

说明:

DN:DataNode

NN:NameNode

JN:JournalNode

NS:NameService

时间: 2024-12-22 22:40:23

hdfs增加ns之后,重启DN报clusterId不匹配错误的相关文章

使用虚拟机克隆CentOS 6.9系统重启网卡报错问题的解决

使用虚拟机克隆CentOS6.9系统重启网卡报错问题的解决 1.错误信息 Bringing up interface eth0:  Device eth0 does not seem to be present,delaying initialization.                    [FAILED] 2.解决方法 (1)配置IP地址,重启网卡,出现如下报错 (2)这是因为克隆后的系统和原系统MAC地址和UUID一样,删除UUID和MAC地址 (3)删除网卡相关信息的文件 (4)重

重启Apache报错

重启Apache报错,如图所示:server: /etc/httpd/modules/mod_jk.so: wrong ELF class: ELFCLASS64 解决方法

在VMware里克隆出来的redhat linux 6.0,重启网卡报错

在VMware里克隆出来的redhat linux 6.0,重启网卡报错,无法ping通eth0的IP地址. 故障现象: service network restartShutting down loopback insterface:                                                                                                     [  OK  ]Bringing up loopback in

重启Apache报错apache2: Could not reliably determine the server&#39;s fully qualified domain name, using 127.0.1.1 for ServerName ... waiting的解决方法

启动apache提示 : apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName ... waiting apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName (1)

ESXi ubuntu动态增加硬盘无需重启

ESXi Ubuntu动态增加硬盘无需重启 通过增加新的磁盘来扩充根分区而不用重启系统: 第一步是打开您的虚拟机客户端的设置页面,点击 '增加' 按纽,然后继续下一步操作. 选择新磁盘所需要的配置信息,如下图所示的,选择新磁盘的大小和它的类型. 然后进入服务端重复如下的命令来扫描您的磁盘设备,以使新磁盘在系统中可见. for i in `ls /sys/class/scsi_host/*/scan`;do echo "- - -" > $i;done 列出您的 SCSI 设备的名

重启网卡报错,显示device not managed by NetworkManager

虚机安装fedora 14 ,装完后setup设置IP,重启网卡报错,显示device not managed by NetworkManager 解决方法: service NetworkManager stop 再启动网卡就OK了

vSphere克隆虚机重启网卡报错

使用VMware vSphere克隆虚机,修改IP重启网卡报错: 解决报错:修改MAC地址,重启网卡 编辑虚拟机配置,查看MAC地址 修改 /etc/udev/rules.d/70-persistent-net.rules 文件,保存退出执行 start_udev 修改网卡配置文件./etc/sysconfig/network-scripts/ifcfg-eth0/1 HWADDR="MAC地址"  和 /etc/udev/rules.d/70-persistent-net.rules

修改mysql端口后重启mysql报错:Can&#39;t start server: Bind on TCP/IP port. Got error...n denied

1:错误信息:如下 [[email protected] ~]# systemctl status mariadb ● mariadb.service - MariaDB 10.2.30 database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/mariadb.service.d └─

nginx报502 bad GateWay错误的解决方法

nginx+php-fpm+mysql的网站,访问nginx的某个页面,报502 GateWay的错误,一般见到此错误,可以判断是php-fpm的问题,而不是nginx的问题.通过监控nginx的错误日志,发现出现如下的错误: upstream sent too big header while reading response header from upstream 通过google一番,解决此问题: 在 nginx.conf 的http段,加入下面的配置: proxy_buffer_siz