Oracle DataGuard Study之--DataGuard FailOver案例
Oracle DG(Dataguard)是目前比较常见的数据库HA配置策略。通过实现Physical Standby和Logical Standby,可以实现数据冗余容错机制。防止在主库出现严重故障,不能支持服务的时候,没有快速的后备支持环境。
在DG中,switchover和failover是两个重要的概念,也是DG实现的核心。两者共同点都是Primary和Standby角色切换,差异在于Planned和UnPlanned之分。Switchover关键点在于Planned,这个切换动作是在运维机构规划范围内的动作。比如,进行定期系统软硬件升级、设备维修等动作。而Failover是真正出现严重系统故障,如数据库宕机、软硬件故障导致的Primary不能支持服务,从而进行的切换动作。
根据不同的DG配置,switchover和failover也是有差异的。理论上,Switchover是不会造成数据丢失的,Primary在切换之后也是在DG配置环境中,作为Standby存在的。但是Failover则不同,除了运行在最大保护(Maximum Protection)模式下,Primary突发的故障可能引起一部分Redo Log不能及时的传递到Standby端,切换之后很可能有数据损失的情况。更重要的是,Primary端在发生Failover之后,是不能够直接加入回DG配置的!也就是说,Failover之后,Primary实际上就是被“抛出”了DG环境。
那么,有什么方法实现Primary回到原有的环境呢?这个问题的困难在于保持Primary和Standby一致。在正常情况下,Primary和Standby之间是关联同步的,即使发生了Switchover,也在可控情况下。Failover过程中有数据的缺失,还有Primary修复问题。在目前流行版本(11g)中,有三个方法:
ü
环境重建:一种最简单的方法就是直接删除原来的Primary库,引用DG重建方法,重新搭建Standby端;
ü
RMAN备份恢复:如果Primary端保留过一份Failover之前的备份,则可以强制原来的Primary端恢复到进行Failover的时间点,之后作为Standby接收当前Primary的redo log传递,应用后可以跟上进度;
ü
Flashback
Database恢复:Flashback技术是作为传统备份还原技术的补充,提供了更加便捷的恢复策略。使用flashback,可以将数据库恢复到failover之前的时间点。之后的过程和RMAN备份恢复策略相同;
案例分析:
一、在主库端模拟数据库意外宕机
[email protected]>conn /as sysdba Connected. [email protected]>alter system switch logfile; System altered. [email protected]>shutdown abort ORACLE instance shut down.
二、在备库端
1、查看切换信息
[email protected]>select name,database_role,switchover_status from v$database; NAME DATABASE_ROLE SWITCHOVER_STATUS --------- ---------------- -------------------- TESTDB12 PHYSICAL STANDBY NOT ALLOWED 可以看到此时备库处于无法切换状态
2、直接切换
[email protected]>alter database commit to switchover to primary; alert_log:(告警日志) Fatal NI connect error 12514, connecting to: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=shsrv)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=shdb)(CID=(PROGRAM=oracle)(HOST=bjsrv)(USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.3.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production Time: 04-MAR-2015 21:25:13 Tracing not turned on. Tns error struct: ns main err code: 12564 TNS-12564: TNS:connection refused ns secondary err code: 0 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Error 12514 received logging on to the standby FAL[client, MRP0]: Error 12514 connecting to shdb for fetching gap sequence Wed Mar 04 21:26:00 2015 alter database commit to switchover to primary ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12) Maximum wait for role transition is 15 minutes. Switchover: Media recovery is still active Database not available for switchover End-Of-REDO archived log file has not been recovered Database not available for switchover End-Of-REDO archived log file has not been recovered Database not available for switchover
3、关闭standby MPR进程
[email protected]>ALTER DATABASE RECOVER managed standby database finish; ALTER DATABASE RECOVER managed standby database finish Terminal Recovery: request posted (TestDB12) Wed Mar 04 21:34:34 2015 Begin: Standby Redo Logfile archival End: Standby Redo Logfile archival Terminal Recovery timestamp is ‘03/04/2015 21:34:34‘ Terminal Recovery: applying standby redo logs. Terminal Recovery: thread 1 seq# 34 redo required Media Recovery Waiting for thread 1 sequence 34 Terminal Recovery: End-Of-Redo log allocation Terminal Recovery: standby redo logfile 4 created ‘/dsk4/arch_bj/arch_1_0_820054583.log‘ This standby redo logfile is being created as part of the failover operation. This standby redo logfile should be deleted after the switchover to primary operation completes. Media Recovery Log /dsk4/arch_bj/arch_1_0_820054583.log Terminal Recovery: log 4 reserved for thread 1 sequence 34 Recovery of Online Redo Log: Thread 1 Group 4 Seq 34 Reading mem 0 Mem# 0: /dsk4/arch_bj/arch_1_0_820054583.log Identified End-Of-Redo (failover) for thread 1 sequence 34 at SCN 0xffff.ffffffff Incomplete Recovery applied until change 1234252 time 03/04/2015 21:23:43 MRP0: Media Recovery Complete (TestDB12) Terminal Recovery: successful completion Wed Mar 04 21:34:35 2015 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance TestDB12 - Archival Error ORA-16014: log 4 sequence# 34 not archived, no available destinations ORA-00312: online log 4 thread 1: ‘/dsk4/arch_bj/arch_1_0_820054583.log‘ Forcing ARSCN to IRSCN for TR 0:1234252 Attempt to set limbo arscn 0:1234252 irscn 0:1234252 Resetting standby activation ID 2865247982 (0xaac836ee) MRP0: Background Media Recovery process shutdown (TestDB12) Terminal Recovery: completion detected (TestDB12) Completed: ALTER DATABASE RECOVER managed standby database finish
4、切换数据库到Primary
[email protected]>select status from v$instance; STATUS ------------ OPEN [email protected]>select name,database_role,switchover_status from v$database; NAME DATABASE_ROLE SWITCHOVER_STATUS --------- ---------------- -------------------- TESTDB12 PHYSICAL STANDBY TO PRIMARY [email protected]>alter database commit to switchover to primary; Database altered. [email protected]>alter database open; Database altered. 告警日志: alter database commit to switchover to primary ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12) Maximum wait for role transition is 15 minutes. All dispatchers and shared servers shutdown CLOSE: killing server sessions. CLOSE: all sessions shutdown successfully. Wed Mar 04 21:35:47 2015 SMON: disabling cache recovery Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/bjdb/TestDB12/trace/TestDB12_ora_3146.trc Standby terminal recovery start SCN: 1234251 RESETLOGS after incomplete recovery UNTIL CHANGE 1234252 Online log /dsk2/oradata/bjdb/redo01b.log: Thread 1 Group 1 was previously cleared Online log /dsk1/oradata/bjdb/redo01a.log: Thread 1 Group 1 was previously cleared Online log /dsk2/oradata/bjdb/redo02b.log: Thread 1 Group 2 was previously cleared Online log /dsk1/oradata/bjdb/redo02a.log: Thread 1 Group 2 was previously cleared Online log /dsk2/oradata/bjdb/redo03b.log: Thread 1 Group 3 was previously cleared Online log /dsk1/oradata/bjdb/redo03a.log: Thread 1 Group 3 was previously cleared Standby became primary SCN: 1234250 Wed Mar 04 21:35:47 2015 Setting recovery target incarnation to 3 AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file. Switchover: Complete - Database mounted as primary Completed: alter database commit to switchover to primary
三、原主库修复后,开机
[email protected]>startup ORACLE instance started. Total System Global Area 442601472 bytes Fixed Size 2229184 bytes Variable Size 281021504 bytes Database Buffers 155189248 bytes Redo Buffers 4161536 bytes Database mounted. Database opened. [email protected]>select name,database_role,switchover_status from v$database; NAME DATABASE_ROLE SWITCHOVER_STATUS --------- ---------------- -------------------- TESTDB12 PRIMARY FAILED DESTINATION
现在原来的主库被修复后,整个DataGuara架构已经被破坏了,所以必须把原来的主库构建成新的备库,重新恢复DataGuard的环境。
四、重新构建DataGuard
[email protected]>select name,database_role from v$database;
NAME DATABASE_ROLE
-------------------------------------------------- ----------------
TESTDB12 PHYSICAL STANDBY