问题背景
- 适用情况:
操作系统: redhat 6.5
数据库: oracle 11g r2
问题描述: failover后原主库无法恢复和启动或者丢失主备关系
- 优点
- 不需要对primary数据库停机
- 执行简单
- 实施前准备工作
1.测试dumplicate
2.测试环境数据库利用dumplicate重建stanby数据库
实施步骤
- 备份新主库
注意备份脚本,应该备份到服务器的本地磁盘而不是带库。
rman_backup.sh备份本地脚本:
#!/bin/sh
#oracle environment...........
export ORACLE_BASE=/data/oracle/app
export ORACLE_HOME=$ORACLE_BASE/oracle/product/11.2.0/dbhome_1
export ORACLE_SID=orcl_stby
export PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
day=`date -u +%Y%m%d `
cd /data/bak/rman_backup
rman target / nocatalog log=/data/bak/rman_backup/rman_backup$day.log <<EOF
crosscheck archivelog all;
crosscheck backup;
delete noprompt expired archivelog all;
delete noprompt expired backup;
run{ allocate channel c1 type disk;
allocate channel c2 type disk;
backup database format ‘/data/bak/rman_backup/%d_full_%T%s%p.bck‘;
sql "alter system archive log current";
backup archivelog all format ‘/data/bak/rman_backup/%d_arc_%T%s%p.bck‘;
backup current controlfile format = ‘/data/bak/rman_backup/controlfile%T%s%p.bck‘;
release channel c1;
release channel c2;
}
exit;
EOF
- 删除原主库
这一步以后,后面步骤都约定改原主库叫“备库”,新主库叫“主库”。
1.关闭数据库;
SQL>shutdown immediate;
2.以restrict方式重新打开数据库,并启动到mount状态;
sqlplus / as sysdba
SQL>startup restrict mount; --> # 只有拥有sysdba角色权限的用户才可以登录数据库,普通用户则不可以(防止有其他用户对数据库进行访问)
3.再次确认数据库名,以防止误删除,本次要删除的是orcl;
SQL>select name from v$database;
4.使用drop database语句;
SQL>drop database; --> # (10g及以后版本适用)
# 它只删除了数据库文件(控制文件、数据文件、日志文件、spfile),但并不删除$ORACLE_BASE/admin/$ORACLE_SID目录下的文件 也不会删除初始化参数文件及密码文件,归档日志也不会被删掉。
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[[email protected] ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 23 14:52:03 2017
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup restrict mount;
ORACLE instance started.
Total System Global Area 6747725824 bytes
Fixed Size 2213976 bytes
Variable Size 5100275624 bytes
Database Buffers 1610612736 bytes
Redo Buffers 34623488 bytes
Database mounted.
SQL> select name from v$database;
NAME
---------
ORCL
SQL> drop database;
Database dropped.
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> exit
[[email protected] ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 23 14:56:20 2017
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL>
- 备库准备startup nomount
准备pfile配置文件,最好是原来构建DataGuard时创建的的pfile。
注意把pfile改成init$ORACLE_SID.ora的格式(initorcl.ora),并且放到/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/目录下:
SQL>startup nomount;
- rman连接主库和备库
执行RMAN连接前,先确认以下几项是否有问题:
1.防火墙关闭
2.tnsnames.ora,各自服务器须能监听对方
3.sys密码最好一致
4.db_file_name_convert和log_file_name_convert,若目录不一致,pfile需要制定这两个参数
由于之前都构建过DataGuard所以,这几项在生产环境不受影响.
rman target sys/[email protected]_stby auxiliary sys/[email protected]
使用duplicate命令重建standby数据库
因为主备库的路径相同,使用下面命令:
RMAN>duplicate target database for standby from active database nofilenamecheck;
- 验证数据库
打开备库:
SQL>alter database open; #这一步可能报错,暂时不管,最后再测试是否可以open
SQL>CREATE SPFILE FROM PFILE=‘/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/initorcl.ora‘;
SQL>select status from v$instance;
SQL>select open_mode from v$database;
查看主库:
SQL>select status from v$instance;
SQL>select open_mode from v$database;
查看GAP_STATUS
SQL>SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;
如果状态是DEFER
SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=‘ENABLE‘ SCOPE=BOTH;
启动实时同步:
SQL>alter database recover managed standby database using current logfile disconnect from session;
SQL>select process,thread#,status from v$managed_standby;
SQL>SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG;
SQL>SELECT SWITCHOVER_STATUS FROM V$DATABASE;
- 恢复DMGRL关系
DGMGRL>show database verbose orcl;
查询数据库状态还是Database Status:SHUTDOWN
登录备库,启动dg_broker:
SQL> show parameter dg_broker_start;
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
dg_broker_start boolean FALSE
SQL> alter system set dg_broker_start = true scope=both;
System altered.
SQL>!ps -ef|grep dmon
- 遗留疑问
本次测试仅仅持续了3个多小时,导致新归档了15个归档日志,duplicat完成后,启用LOG_ARCHIVE_DEST_STATE_2,只恢复了6个,虽然LOG各项指标检查没有问题,数据库也可以open,但是数据是否会存在一致性问题?
生产环境因为一个小时一个归档,整个操作来说3个小时就可以完成,所以倒不用担心日志缺失的问题。
- 生产过程正式实施新发现和解决的问题
1.生产实施的时候发现主库log_archive_dest_2状态是INACTIVE,应该是上回failover后没有完整完成,所以导致主库丢失了log_archive_dest_2
SQL> SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;
STATUS GAP_STATUS
--------- ------------------------
INACTIVE
然后执行以下SQL,补回log_archive_dest_2参数即可:
alter system set log_archive_dest_2=‘SERVICE=orcl LGWR SYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=orcl‘ scope=both;
gap状态变为RESOLVABLE GAP,切换日志后,即变为NO GAP。
2.BROKER主备数据库状态配置都不对,需要重建BROKER
a.删除原来的configuration
DISABLE FAST_START FAILOVER FORCE;
(1)观察器上
disable configuration;
remove database orcl;
remove database orcl_stby;
remove configuration;
(2)在两个库上
alter system set dg_broker_start = false scope=both;
show parameter broker;
重命名/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/下的
dr1orcl_stby.dat和dr2orcl_stby.dat文件
(3)在两个库上
alter system set dg_broker_start = true scope=both;
b.重建configuration
DGMGRL> create configuration DG_orcl as primary database is orcl_stby connect identifier is orcl_stby;
DGMGRL> add database orcl as connect identifier is orcl maintained as physical;
DGMGRL> show database orcl_stby;
DGMGRL> show database orcl;
DGMGRL> show database verbose orcl_stby;
DGMGRL> edit database ‘orcl‘ set property ‘ArchiveLagTarget‘=‘0‘;
DGMGRL> edit database ‘orcl‘ set property ‘LogArchiveMinSucceedDest‘=‘1‘;
DGMGRL> edit database ‘orcl_stby‘ set property ‘DelayMins‘=‘0‘;
DGMGRL> edit database ‘orcl‘ set property ‘DelayMins‘=‘0‘;
DGMGRL> enable configuration;
DGMGRL> show configuration;
c.启用FAST_START FAILOVER
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit=1800;
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = 15;
GMGRL> EDIT DATABASE orcl_stby SET PROPERTY FastStartFailoverTarget=‘orcl‘;
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE orcl SET PROPERTY FastStartFailoverTarget=‘orcl_stby‘;
Property "faststartfailovertarget" updated
SHOW DATABASE ORCL LOGXPTMODE
SHOW DATABASE ORCL_STBY LOGXPTMODE
EDIT DATABASE ORCL SET PROPERTY LOGXPTMODE=‘SYNC‘;
EDIT DATABASE ORCL_STBY SET PROPERTY LOGXPTMODE=‘SYNC‘;
EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
ENABLE FAST_START FAILOVER;
SHOW FAST_START FAILOVER;
SHOW CONFIGURATION VERBOSE;