昨天去某客户部署RMAN备份,在跑shell脚本的时候,提示找不到归档日志,因为客户那里之前只对数据库做EXPDP逻辑导出备份,并且每天都自动删除前一天的归档,每个归档的生成量大概在200M左右,因为考虑磁盘空间比较紧张,不得已而为之。
在我的脚本中,是采用backup database format ‘xxx‘ plus archivelog format ‘xxx‘的方式进行备份的,在执行RMAN脚本时,由于是先对归档日志进行备份,再对数据库进行备份的,在脚本的输出日志中,提示找不到35xxx的归档日志,而现在最旧的还物理存在的归档日志已经到36xxx了,也就是说,这1000多个日志其实是已经被OS脚本从物理上删除了,后来检验了一下确实如此,每天生成的归档数量大约为40个左右,那么一天的归档日志就是800M左右
于是进入RMAN对归档日志进行crosscheck,发现有7000多个归档日志检查结果是“failed”的(物理上已被删除),命令如下:
RMAN> crosscheck archivelog all;
再对这些expired的归档,从控制文件信息删除掉
RMAN> delete noprompt expired archivelog all;
然后再继续跑脚本,之后的备份就很顺利,先把归档日志备份完毕,接着是备份数据库,最后是控制文件和spfile
回来之后,我自己又做了一个测试,发现即便是当归档日志文件未failted的时候,备份还是顺利地进行下去了,下面看我自己的测试:
1. 先把归档日志移动到别的目录(模拟日志被物理删除)
[[email protected] ~]$ cd $ORACLE_BASE/flash_recovery_area/ORA10G/archivelog
[[email protected] archivelog]$ ll
total 8
drwxr-x--- 2 oracle oinstall 4096 Sep 24 11:46 2014_09_24
[[email protected] archivelog]$ mv 2014_09_24/* .
[[email protected] archivelog]$ ll
total 6636
drwxr-x--- 2 oracle oinstall 4096 Sep 24 11:59 2014_09_24
-rw-r----- 1 oracle oinstall 597504 Sep 24 11:46 o1_mf_1_10_b24ho39f_.arc
-rw-r----- 1 oracle oinstall 5473792 Sep 24 11:35 o1_mf_1_3_b24gz52r_.arc
-rw-r----- 1 oracle oinstall 373248 Sep 24 11:35 o1_mf_1_4_b24gz81d_.arc
-rw-r----- 1 oracle oinstall 180224 Sep 24 11:35 o1_mf_1_5_b24gzbb6_.arc
-rw-r----- 1 oracle oinstall 33792 Sep 24 11:35 o1_mf_1_6_b24gzd6y_.arc
-rw-r----- 1 oracle oinstall 26624 Sep 24 11:35 o1_mf_1_7_b24gzky8_.arc
-rw-r----- 1 oracle oinstall 1536 Sep 24 11:35 o1_mf_1_8_b24gzqnt_.arc
-rw-r----- 1 oracle oinstall 57344 Sep 24 11:44 o1_mf_1_9_b24hjflc_.arc
这里共有8个归档日志文件被“删除”
2. 进入RMAN,对归档日志进行交叉校验
[[email protected] archivelog]$ exit
exit
host command complete
RMAN> crosscheck archivelog all;
released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
validation failed for
archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_3_b24gz52r_.arc recid=208 stamp=859116904
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_4_b24gz81d_.arc recid=209 stamp=859116904
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_5_b24gzbb6_.arc recid=210 stamp=859116906
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_6_b24gzd6y_.arc recid=211 stamp=859116908
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_7_b24gzky8_.arc recid=212 stamp=859116914
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_8_b24gzqnt_.arc recid=213 stamp=859116919
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_9_b24hjflc_.arc recid=214 stamp=859117453
validation failed for archived log
archive log filename=/u01/app/oracle/flash_recovery_area/ORA10G/archivelog/2014_09_24/o1_mf_1_10_b24ho39f_.arc recid=215 stamp=859117603
Crosschecked 8 objects
3. 不删除expired的归档,直接跑RMAN备份脚本
这次并没有报错并提示xxx归档找不到,而是直接就完成了备份,生成了备份集297,与在客户那里碰到的情况有出入
--备份完成后,查看生成的备份集
[[email protected] ~]$ cd /u01/orabackup/backupsets/
[[email protected] backupsets]$ ll -lrth
total 1.6G
......其他备份集(略)
-rw-r----- 1 oracle oinstall 606K Sep 24 12:00 ora10g-4175411955_20140924_859118422_297.arc
-rw-r----- 1 oracle oinstall 166M Sep 24 12:02 ora10g-4175411955_20140924_859118425_298.db
-rw-r----- 1 oracle oinstall 610K Sep 24 12:02 ora10g-4175411955_20140924_859118562_299.arc
-rw-r----- 1 oracle oinstall 7.3M Sep 24 12:02 ora10g-c-4175411955-20140924-01.ctl
可以看到,备份全部完成了,共生成了2个归档日志备份集(arc),1个数据库备份集(db)以及控制文件备份集(ctl),这里有个细节要注意,由于我在脚本中写入了%s参数,从上面生成备份集生成的时间以及顺序可以发现RMAN备份这样一个顺序:
1. 对现有可以备份的数据库归档日志文件做一个备份
2. 对数据库进行备份
3. 切换一下日志,对完成全库备份后的归档日志再做一个备份(即使你没有通过RMAN> sql "alter system archive log current";来手动切)
4. 对控制文件备份(包括spfile,生成在同一个备份集)
我们可以看一下详细的日志输出,来对这个顺序有更深刻的了解:
Starting backup at 24-914
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=11 recid=216 stamp=859118422
channel ORA_DISK_1: starting piece 1 at 24-914
channel ORA_DISK_1: finished piece 1 at 24-914
piece handle=/u01/orabackup/backupsets/ora10g-4175411955_20140924_859118422_297.arc tag=ARC_BAK
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 24-914
Starting backup at 24-914
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/app/oracle/oradata/ora10g/system01.dbf
input datafile fno=00003 name=/u01/app/oracle/oradata/ora10g/sysaux01.dbf
input datafile fno=00002 name=/u01/app/oracle/oradata/ora10g/undotbs01.dbf
input datafile fno=00005 name=/u01/app/oracle/oradata/ora10g/example01.dbf
input datafile fno=00006 name=/u01/app/oracle/oradata/ora10g/zlm01.dbf
input datafile fno=00004 name=/u01/app/oracle/oradata/ora10g/users01.dbf
channel ORA_DISK_1: starting piece 1 at 24-914
channel ORA_DISK_1: finished piece 1 at 24-914
piece handle=/u01/orabackup/backupsets/ora10g-4175411955_20140924_859118425_298.db tag=DB_BAK
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:16
Finished backup at 24-914
Starting backup at 24-914
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=12 recid=217 stamp=859118561
channel ORA_DISK_1: starting piece 1 at 24-914
channel ORA_DISK_1: finished piece 1 at 24-914
piece handle=/u01/orabackup/backupsets/ora10g-4175411955_20140924_859118562_299.arc tag=ARC_BAK
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 24-914
Starting Control File and SPFILE Autobackup at 24-914
piece handle=/u01/orabackup/backupsets/ora10g-c-4175411955-20140924-01.ctl comment=NONE
Finished Control File and SPFILE Autobackup at 24-914
顺序执行的4个部分,分别生成了4个备份集,虽然从日志来看,并没有发现备份完数据库以后,RMAN对归档日志进行了切换,但实际上确实是如此的,因为此刻我并没有对数据库进行切换归档操作。其实在RMAN对数据库备份之后对日志切换后再把归档日志备份一遍,是为了能让尽可能多的数据库改动都被写入磁盘,方便将来用备份集恢复。
对数据库的DML操作是通过写入online redo logfile来保证其一致性的,即使是undo中的数据,也会写入redo日志,我们称之为undo的redo,undo的redo向量与redo的redo向量合并为一条redo记录,被写入online redo logfile,而归档日志的作用,就是当online redo logfile要被刷新时,先对这些要刷新的内容进行归档,那么既然redo内容包含了所以的数据库的更改,恢复时也必须要使用归档日志来进行recover,把数据库推进到一致性状态。
RMAN是属于非一致性的备份,可以在数据库open状态下对数据库进行备份,要使不一致性备份的数据库达到一致性状态,那么就只有靠归档日志来实现了。那么,当RMAN切换日志的那一刻,可以保证数据库的最新更改内容被写到这个归档日志中,并对其备份,那么备份时间点之前的那些归档,就不再需要了。
考虑到备份全部归档会浪费大量磁盘空间,也没有这个必要,我们可以在RMAN备份前,先删除一部分归档,使归档备份集可以小一点,在磁盘空间比较紧张的情况下尤为需要注意,删除归档,可以使用以下的命令:
--删除7天前归档
RMAN> delete noprompt archivelog all completed before ‘sysdate-7‘;
或
RMAN> delete noprompt archivelog until time ‘sysdate-7‘;
--删除7小时前归档
RMAN> delete noprompt archivelog all completed
before ‘sysdate-7/24‘;
或
RMAN> delete noprompt archivelog until time ‘sysdate-7/24‘;
注意语法区别,前者有all关键字,而后者没有,不能互换使用,否则会报错:
RMAN> delete noprompt archivelog all until time ‘sysdate-7‘;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "until": expecting one of: "backed, completed, device, like, ;, tag"
RMAN-01007: at line 1 column 32 file: standard input
RMAN> delete noprompt archivelog completed before ‘sysdate-7/24‘;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "completed": expecting one of: "all, double-quoted-string, from, high, integer, like, logseq, low, scn, sequence, single-quoted-string, time, until"
RMAN-01007: at line 1 column 28 file: standard input
删除完以后,记得crosscheck一下,再把失效的归档日志记录从控制文件中去掉
RMAN> crosscheck achivelog all;
RMAN> delete noprompt expired archivelog all;
备份的时候还可以再利用压缩特性(RMAN> backup as compressed backupset ...),使备份集更加小,但可能带来的副作用是备份和恢复时间会适当久一些,看磁盘空间和时间哪个更重要了。这样,我们的备份集才是最精简的。