How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems

(文档 ID 1062983.1)

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1.0 to 11.2.0.4 [Release 11.2]
Information in this document applies to any platform.

Goal

It is not possible to directly restore a manual or automatic OCR
backup if the OCR is located in an ASM disk group. This is caused by the
fact that the command ‘ocrconfig -restore‘ requires ASM to be up &
running in order to restore an OCR backup to an ASM disk group. However,
for ASM to be available, the CRS stack must have been successfully
started. For the restore to succeed, the OCR also must not be in use
(r/w), i.e. no CRS daemon must be running while the OCR is being
restored.

A description of the general procedure to restore the OCR can be found in the  documentation,
this document explains how to recover from a complete loss of the ASM
disk group that held the OCR and Voting files in a 11gR2 Grid
environment.

Solution

When using an ASM disk group for CRS there are typically 3 different
types of files located in the disk group that potentially need to be
restored/recreated:

  • the Oracle Cluster Registry file (OCR)
  • the Voting file(s)
  • the shared SPFILE for the ASM instances

The following example assumes that the OCR was located in a single
disk group used exclusively for CRS. The disk group has just one disk
using external redundancy.

Since the CRS disk group has been lost the CRS stack will not be available on any node.

The following settings used in the example would need to be replaced according to the actual configuration:

GRID user:                       oragrid
GRID home:                       /u01/app/11.2.0/grid ($CRS_HOME)
ASM disk group name for OCR:     CRS
ASM/ASMLIB disk name:            ASMD40
Linux device name for ASM disk:  /dev/sdh1
Cluster name:                    rac_cluster1
Nodes:                           racnode1, racnode2

This document assumes that the name of the OCR
diskgroup remains unchanged, however there may be a need to use a
different diskgroup name, in which case the name of the OCR diskgroup
would have to be modified in /etc/oracle/ocr.loc across all nodes prior
to executing the following steps.

1. Locate the latest automatic OCR backup

When using a non-shared CRS home, automatic OCR backups can be located
on any node of the cluster, consequently all nodes need to be checked
for the most recent backup:

$ ls -lrt $CRS_HOME/cdata/rac_cluster1/
-rw------- 1 root root 7331840 Mar 10 18:52 week.ocr
-rw------- 1 root root 7651328 Mar 26 01:33 week_.ocr
-rw------- 1 root root 7651328 Mar 29 01:33 day.ocr
-rw------- 1 root root 7651328 Mar 30 01:33 day_.ocr
-rw------- 1 root root 7651328 Mar 30 01:33 backup02.ocr
-rw------- 1 root root 7651328 Mar 30 05:33 backup01.ocr
-rw------- 1 root root 7651328 Mar 30 09:33 backup00.ocr

2. Make sure the Grid Infrastructure is shutdown on all nodes

Given that the OCR diskgroup is missing, the GI stack will not be
functional on any node, however there may still be various daemon
processes running.  On each node shutdown the GI stack using the force
(-f) option:

# $CRS_HOME/bin/crsctl stop crs -f

3. Start the CRS stack in exclusive mode

On the node that has the most recent OCR backup, log on as root and
start CRS in exclusive mode, this mode will allow ASM to start &
stay up without the presence of a Voting disk and without the CRS daemon
process (crsd.bin) running.

11.2.0.1:

# $CRS_HOME/bin/crsctl start crs -excl
...
CRS-2672: Attempting to start ‘ora.asm‘ on ‘racnode1‘
CRS-2676: Start of ‘ora.asm‘ on ‘racnode1‘ succeeded
CRS-2672: Attempting to start ‘ora.crsd‘ on ‘racnode1‘
CRS-2676: Start of ‘ora.crsd‘ on ‘racnode1‘ succeeded

Please note:
This document
assumes that the CRS diskgroup was completely lost, in which  case the
CRS daemon (resource ora.crsd) will terminate again due to the
inaccessibility of the OCR - even if above message indicates that the
start succeeded.
If this is not the case - i.e. if the CRS
diskgroup is still present (but corrupt or incorrect) the CRS daemon
needs to be shutdown manually using:

# $CRS_HOME/bin/crsctl stop res ora.crsd -init

otherwise the subsequent OCR restore will fail.

11.2.0.2 and above:

# $CRS_HOME/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
...
CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip‘ on ‘auw2k3‘
CRS-2672: Attempting to start ‘ora.ctssd‘ on ‘racnode1‘
CRS-2676: Start of ‘ora.drivers.acfs‘ on ‘racnode1‘ succeeded
CRS-2676: Start of ‘ora.ctssd‘ on ‘racnode1‘ succeeded
CRS-2676: Start of ‘ora.cluster_interconnect.haip‘ on ‘racnode1‘ succeeded
CRS-2672: Attempting to start ‘ora.asm‘ on ‘racnode1‘
CRS-2676: Start of ‘ora.asm‘ on ‘racnode1‘ succeeded

IMPORTANT:
A new option ‘-nocrs
has been introduced with  11.2.0.2, which prevents the start of the
ora.crsd resource. It is vital that this option is specified, otherwise
the failure to start the ora.crsd resource will tear down
ora.cluster_interconnect.haip, which in turn will cause ASM to crash.

4. Label the CRS disk for ASMLIB use

If using ASMLIB the disk to be used for the CRS disk group needs to stamped first, as user root do:

# /usr/sbin/oracleasm createdisk ASMD40 /dev/sdh1
Writing disk header: done
Instantiating disk: done

5. Create the CRS diskgroup via sqlplus

The disk group can now be (re-)created via sqlplus from the grid user. The compatible.asm attribute must be set to 11.2 in order for the disk group to be used by CRS:

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 30 11:47:24 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create diskgroup CRS external redundancy disk ‘ORCL:ASMD40‘ attribute ‘COMPATIBLE.ASM‘ = ‘11.2‘;

Diskgroup created.
SQL> exit

6. Restore the latest OCR backup

Now that the CRS disk group is created & mounted the OCR can be restored - must be done as the root user:

# cd $CRS_HOME/cdata/rac_cluster1/
# $CRS_HOME/bin/ocrconfig -restore backup00.ocr

7. Start the CRS daemon on the current node (11.2.0.1 only !)

Now that the OCR has been restored the CRS daemon can be started, this
is needed to recreate the Voting file. Skip this step for 11.2.0.2.0.

# $CRS_HOME/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start ‘ora.crsd‘ on ‘racnode1‘
CRS-2676: Start of ‘ora.crsd‘ on ‘racnode1‘ succeeded

8. Recreate the Voting file

The Voting file needs to be initialized in the CRS disk group:

# $CRS_HOME/bin/crsctl replace votedisk +CRS
Successful addition of voting disk 00caa5b9c0f54f3abf5bd2a2609f09a9.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced

9. Recreate the SPFILE for ASM (optional)

Please note:

Starting with 11gR2 ASM can start without a PFILE or SPFILE, so if you are
- not using an  SPFILE for ASM
- not using a shared SPFILE for ASM
- using a shared SPFILE not stored in ASM (e.g. on cluster file system)
this step possibly should be skipped.

Also use extra care in regards to the asm_diskstring parameter as it impacts the discovery of the voting disks.

Please verify the previous settings using the ASM alert log.

Prepare a pfile (e.g. /tmp/asm_pfile.ora) with the ASM startup
parameters - these may vary from the example below. If in doubt consult
the ASM alert log  as the ASM instance startup should list all
non-default parameter values. Please note the last startup of ASM (in
step 2 via CRS start) will not have used an SPFILE, so a startup prior
to the loss of the CRS disk group would need to be located.

*.asm_power_limit=1
*.diagnostic_dest=‘/u01/app/oragrid‘
*.instance_type=‘asm‘
*.large_pool_size=12M
*.remote_login_passwordfile=‘EXCLUSIVE‘

Now the SPFILE can be created using this PFILE:

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 30 11:52:39 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create spfile=‘+CRS‘ from pfile=‘/tmp/asm_pfile.ora‘;

File created.
SQL> exit

10. Shutdown CRS

Since CRS is
running in exclusive mode, it needs to be shutdown  to allow CRS to run
on all nodes again. Use of the force (-f) option may be required:

# $CRS_HOME/bin/crsctl stop crs -f
...
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘auw2k3‘ has completed
CRS-4133: Oracle High Availability Services has been stopped.


11. Rescan ASM disks

If using ASMLIB rescan all ASM disks on each node as the root user:

# /usr/sbin/oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "ASMD40"

12. Start CRS
As the root user submit the CRS startup on all cluster nodes:

# $CRS_HOME/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

13. Verify CRS

To verify that CRS is fully functional again:

# $CRS_HOME/bin/crsctl check cluster -all
**************************************************************
racnode1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

# $CRS_HOME/bin/crsctl status resource -t

时间: 2024-08-25 12:09:03

How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems的相关文章

11gR2 RAC 独占模式replace votedisk遭遇PROC-26,restore ocr遭遇CRS-4000、PROT-35

原文链接:http://blog.itpub.net/23135684/viewspace-748816/ 11gR2 RAC系统的存储数据全然丢失,全部节点的软件都安装在本地磁盘中.本地磁盘保留了OCR的备份,以下讨论通过replace votedisk和restore ocr的方式恢复Clusterware的正常执行: 1.启动CRS到独占模式. [[email protected] bin]#./crsctl stop has -f [[email protected] bin]#./cr

Maste Note for OCR / Vote disk Maintenance Operations (ADD/REMOVE/REPLACE/MOVE)

Doc ID 428681.1 Applies to: Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.2.0.1.0 [Release 10.2 to 11.2]Information in this document applies to any platform. Goal The goal of this note is to provide steps to add, remove, replace or mo

PPAS Migration Toolkit document

-----------------Migration Toolkit-----------------Migration Toolkit is a command line utility that imports data or schema objectdefinitions immediately, or generates scripts that can be used at a later timeto duplicate data and database objects. Mig

How to Setup NFS (Network File System) on RHEL/CentOS/Fedora and Debian/Ubuntu

NFS (Network File System) is basically developed for sharing of files and folders between Linux/Unix systems by Sun Microsystems in 1980. It allows you to mount your local file systems over a network and remote hosts to interact with them as they are

Oracle 10g下ocr和votedisk的管理

ocr和votedisk是什么? 作为集群,oracle cluster需要共享存储来存放整个集群的配置信息,ocr便是用例存放这些配置信息的地方,ocr的存储容量一般不会太大,在10g下,oracle建议256M已经足以.ocr必须需要存储在集群文件系统或者裸设备上,出于性能上的考虑,本人建议将ocr建立在裸设备上,性能高并且管理也不复杂(ocr和votedisk的数量一般不会太多).ocr中存放的是集群的配置信息,这些信息只能在一个节点上进行维护操作,这一节点叫做Master Node,其他

重新初始化RAC的OCR盘和Votedisk盘,修复RAC系统

假设我们的RAC环境中OCR磁盘和votedisk磁盘全部被破坏,并且都没有备份,那么我们该如何恢复我们的RAC环境.最近简单的办法就是重新初始化我们的ocr盘和votedisk盘,把集群中的所有相关资源重新注册到OCR磁盘和votedisk磁盘中. 1.停掉所有节点的Clusterware Stack [[email protected] bin]# ./crsctl stop crs Stopping resources. Successfully stopped CRS resources

OCR磁盘的导出和导入、备份和恢复以及移动(ocrconfig命令的应用)

数据库版本:10.2.0.1 一,使用导出.导入进行备份和恢复 Oracle推荐在对集群做调整时,比如增加.删除节点之前,应该对OCR做一个备份,可以使用export 备份到指定文件.如果做了replace或restore等操作,Oracle建议使用"cluvfy comp ocr -n all" 命令做一次全面检查. 1.首先关闭所有节点的CRS [[email protected]rac3 ~]$ crs_stat -t Name Type Target State Host --

10G RAC RAW+ASM rhel-server-5.5-x86_64

ITPUBhttp://blog.itpub.net/blog/index/这个就是我的博客 由于学校老师只讲了11g的RAC安装,所以想自己试试,中间出了很多错误,借鉴了很多前辈写的文档,无抄袭之意,仅为自己学习所整理,可能有很多错误,欢迎指正 这里我会提供我安装过程中所需要的所有安装包和光盘镜像等,省的大家跟我似的苦逼呵呵找半天 虚拟机镜像: http://pan.baidu.com/s/1dDvNcop ASM软件: http://pan.baidu.com/s/1hGbz4 10g li

Oracle 11G 单机asm安装

VMware workstation上oracle 11G ASM的安装 环境: VMware Workstation :9.0.0 build-812388 OS :RedHat Enterprise Linux Server release 6.3 (Santiago) Oracle :11203 磁盘 50G 分别有两块网卡 内存1.5G Swap为内存的2倍 VMware virtual Ethernet adapter 1   192.168.10网段 1.IP规划 192.168.1