https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin
-metasave filename Save Namenode‘s primary data structures to filename in the directory specified by hadoop.log.dir property. filename is overwritten if it exists. filename will contain one line for each of the following
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currently being replicated
4. Blocks waiting to be deleted
hdfs dfsadmin -fetchImage .
15/12/14 13:56:10 WARN ssl.FileBasedKeyStoresFactory: The property ‘ssl.client.truststore.location‘ has not been set, no TrustStore will be loaded
15/12/14 13:56:10 INFO namenode.TransferFsImage: Opening connection to http://nn1:50070/imagetransfer?getimage=1&txid=latest
15/12/14 13:56:10 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
15/12/14 13:56:10 INFO namenode.TransferFsImage: Transfer took 0.16s at 57.32 KB/s
这个比较不错,它应该会自动取active状态的nn作为数据源来获取数据。
恢复的步骤:
hadoop-daemon.sh stop namenode
rm -fr /hdp/name/dfs/current
然后直接复制fsimage文件到current下面,启动nameode,自动关闭,原因是报namenode没有进行格式化,
因为我们直接删除了它的version文件,从nn2上把VERSION文件复制过来。
再次启动namenode,进程仍然关闭,日志中显示错误是fsimage文件没有md5文件,生成fsimage的md5校验文件
md5sum fsimage* >fsimage**.md5
然后再次启动namenode,正常启动。
因为namenode会定期地进行checkpoint生成新的fsimage文件,而且还有备用机,基本上不用这种方式进行元数据的备份了,
除非在极端情况下,我们的两台机器的存储元数据的磁盘都发生了问题,不能找回数据了,这样的备份才会有意义。