Experience on Namenode backup and restore --- checkpoint

Hadoop version: Hadoop 2.2.0.2.0.6.0-0009

Well, We can do this by building Secondary Namenode, Checkpoint node or Backup node.

Example:

Assuming you have a Secondary Namenode.

1. Check secondary namenode checkpoint status:

dfs.namenode.secondary.http-address in %HADOOP_CONF_DIR%/hdfs-site.xml

fs.namenode.checkpoint.dir in %HADOOP_CONF_DIR%/hdfs-site.xml

dfs.namenode.checkpoint.edits.dir in %HADOOP_CONF_DIR%/hdfs-site.xml

dfs.namenode.checkpoint.period in %HADOOP_CONF_DIR%/hdfs-site.xml

2. Backup your real time checkpoint by hand:

On Secondary namenode, Stop Hadoop secondary namenode service.

Run cmd.exe by user hadoop ( or some users have full permission )

[plain] view
plain copy

Runas /user:hadoop cmd.exe

You must have user hadoop password.

Backup real time checkpoint:

[plain] view
plain copy

cmd>%hadoop_home%/bin/hadoop secondarynamenode -checkpoint force

Start Hadoop secondary namenode service. and check secondary namenode checkpoint status ( see step 1)

3. Stop Namenode services or reboot Namenode ( if hadoop service set to booting manual ,the services would all stop after reboot )

As for test, I backup my dfs.namenode.name.dir (i.e C:\hdpdata\hdfs\nn) first for my next test ( restore from my namenode dir backup ) .

Delete all files in C:\hdpdata\hdfs\nn ,

Open dfs.namenode.checkpoint.dir (see %HADOOP_CONF_DIR%/hdfs-site.xml ) in secondary namenode (i.e. c:\hdpdata\hdfs\snn )

Copy all secondary checkpoint files( except the lock file) from this folder to your namenode‘s checkpoint dir (dfs.namenode.checkpoint.dir the same as secondary namenode)

Make sure namenode‘s checkpoint dir is empty already !

4. Restore from checkpoint dir

Run cmd.exe by user hadoop ( or some users have full permission )

[plain] view
plain copy

Runas /user:hadoop cmd.exe

You must have user hadoop password.

Use this command to start hadoop service and import checkpoint from checkpoint dir

[plain] view
plain copy

cmd>%hadoop_home%/bin/hdfs namenode -importcheckpoint

Use ctrl+C to stop service which is completed. and Delete your namenode‘s checkpoint dir (dfs.namenode.checkpoint.dir the same as secondary namenode)

Start service by this command:

[plain] view
plain copy

cmd>start_local_hdp_services.cmd

Levae safemode

[plain] view
plain copy

cmd>%hadoop_home%/bin/hdfs dfsadmin -safemode leave

Balance you HDFS:

[plain] view
plain copy

cmd>%hadoop_home%/bin/hdfs balancer -threshold 5

5. Confirm your Hadoop service is restored successfully.

Open URL http://namenode:50070/ to check if there are some missing block. If yes. Please kindly check where they are and what they are.

Because restore from secondary namenode isn‘t a real time restore solution. It may lost the last time what you do in the jobtracker. It doesn‘t matter. Just delete them.

Tips: If you want to restore a real time backup, please use multiplicate namenode dir mode. see next post... ...

时间： 2024-10-06 06:06:50

Experience on Namenode backup and restore --- checkpoint

Experience on Namenode backup and restore --- checkpoint的相关文章

hadoop 2.5 hdfs namenode –format 出错Usage: java NameNode [-backup] |

TFS Express backup and restore

Backup and restore of FAST Search for SharePoint 2010

How to backup and restore database in SQL Server

第一章、关于SQL Server数据库的备份和还原(sp_addumpdevice、backup、Restore）

SQL2005中使用backup、restore来备份和恢复数据库

mongodb backup and restore

GPO - Backup and Restore

suitecrm 如何backup and restore ，从一个server 转移到另一个 server . 并保证customer package ， customer module 不丢