情景介绍:
做OCR备份恢复实验,OCR有4份自动备份。将OCR磁盘从+DATA替换为+OCR2(/dev/raw/raw4) 完成之后使用ocrconfig -manualbackup手动备份OCR,完成之后对/dev/raw/raw4执行dd操作。关闭集群,启动集群,发现集群不能启动。
问题分析(假设不知道问题出在哪里,先分析):
1、检查集群服务,发现CRS和CSS服务未能正常启动
crsctl check crs
2、检查CRS和CSS日志,发现OCR磁盘异常
3、恢复OCR(其实就是使用root.sh重建OCR的过程,重建之后需要重新注册相关的资源如listener/database等)
清空所有节点的cluster配置信息:root用户执行 $GRID_HOME/crs/install/rootcrs.pl
节点1
[[email protected] install]# ./rootcrs.pl
Using configuration parameter file: ./crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
节点2
[[email protected] install]# ./rootcrs.pl
Using configuration parameter file: ./crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
清除所有节点的cluster信息
节点1
[[email protected] install]# ./rootcrs.pl -deconfig -force
Using configuration parameter file: ./crsconfig_params
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node1‘
CRS-2673: Attempting to stop ‘ora.mdnsd‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.crf‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.evmd‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘node1‘
CRS-2673: Attempting to stop ‘ora.drivers.acfs‘ on ‘node1‘
CRS-2677: Stop of ‘ora.evmd‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.crf‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.mdnsd‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.drivers.acfs‘ on ‘node1‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘node1‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘node1‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd‘ on ‘node1‘
CRS-2677: Stop of ‘ora.gipcd‘ on ‘node1‘ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd‘ on ‘node1‘
CRS-2677: Stop of ‘ora.gpnpd‘ on ‘node1‘ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘node1‘ has completed
CRS-4133: Oracle High Availability Services has been stopped.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node
节点2
[[email protected] install]# ./rootcrs.pl -deconfig -force -lastnode
Using configuration parameter file: ./crsconfig_params
CRS-5702: Resource ‘ora.cssd‘ is already running on ‘node2‘
CRS-4000: Command Start failed, or completed with errors.
CSS startup failed with return code 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Delete failed, or completed with errors.
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.evmd‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘node2‘
CRS-2677: Stop of ‘ora.evmd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘node2‘ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor‘ on ‘node2‘
CRS-2676: Start of ‘ora.cssdmonitor‘ on ‘node2‘ succeeded
CRS-2672: Attempting to start ‘ora.cssd‘ on ‘node2‘
CRS-2672: Attempting to start ‘ora.diskmon‘ on ‘node2‘
CRS-2676: Start of ‘ora.diskmon‘ on ‘node2‘ succeeded
CRS-2676: Start of ‘ora.cssd‘ on ‘node2‘ succeeded
CRS-4611: Successful deletion of voting disk +DATA.
ASM de-configuration trace file location: /tmp/asmcadc_clean2016-10-31_02-02-22-PM.log
ASM Clean Configuration START
ASM Clean Configuration END
ASM with SID +ASM1 deleted successfully. Check /tmp/asmcadc_clean2016-10-31_02-02-22-PM.log for details.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node2‘
CRS-2673: Attempting to stop ‘ora.ctssd‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.asm‘ on ‘node2‘
CRS-2673: Attempting to stop ‘ora.mdnsd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.mdnsd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.ctssd‘ on ‘node2‘ succeeded
CRS-2677: Stop of ‘ora.asm‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cluster_interconnect.haip‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.cssd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.cssd‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.crf‘ on ‘node2‘
CRS-2677: Stop of ‘ora.crf‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.gipcd‘ on ‘node2‘ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd‘ on ‘node2‘
CRS-2677: Stop of ‘ora.gpnpd‘ on ‘node2‘ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘node2‘ has completed
CRS-4133: Oracle High Availability Services has been stopped.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node
重建OCR和OLR,使用root.sh脚本完成重建,其实就是安装RAC中执行的脚本,默认位置为$GRID_HOME
节点1
[[email protected] grid]# ./root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-2672: Attempting to start ‘ora.mdnsd‘ on ‘node1‘
CRS-2676: Start of ‘ora.mdnsd‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd‘ on ‘node1‘
CRS-2676: Start of ‘ora.gpnpd‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor‘ on ‘node1‘
CRS-2672: Attempting to start ‘ora.gipcd‘ on ‘node1‘
CRS-2676: Start of ‘ora.cssdmonitor‘ on ‘node1‘ succeeded
CRS-2676: Start of ‘ora.gipcd‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.cssd‘ on ‘node1‘
CRS-2672: Attempting to start ‘ora.diskmon‘ on ‘node1‘
CRS-2676: Start of ‘ora.diskmon‘ on ‘node1‘ succeeded
CRS-2676: Start of ‘ora.cssd‘ on ‘node1‘ succeeded
ASM created and started successfully.
Disk Group DATA created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root‘, privgrp ‘root‘..
Operation successful.
Successful addition of voting disk 4331dad495c14f71bfdb6d4f1a82d2f9.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced
STATE File Universal Id File Name Disk group
- ONLINE 4331dad495c14f71bfdb6d4f1a82d2f9 (/dev/raw/raw1) [DATA]
Located 1 voting disk(s).
CRS-2672: Attempting to start ‘ora.asm‘ on ‘node1‘
CRS-2676: Start of ‘ora.asm‘ on ‘node1‘ succeeded
CRS-2672: Attempting to start ‘ora.DATA.dg‘ on ‘node1‘
CRS-2676: Start of ‘ora.DATA.dg‘ on ‘node1‘ succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
节点2
[[email protected] grid]# ./root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
检查资源信息
节点1
[[email protected] grid]# crs_stat -t
Name Type Target State Host
ora.DATA.dg ora....up.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora....ry.acfs ora....fs.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node1
[[email protected] grid]# crsctl stat res -t
NAME TARGET STATE SERVER STATE_DETAILS
Local Resources
ora.DATA.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2
Cluster Resources
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node1
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.oc4j
1 ONLINE ONLINE node1
ora.scan1.vip
1 ONLINE ONLINE node1
节点2
[[email protected] grid]# crs_stat -t
Name Type Target State Host
ora.DATA.dg ora....up.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora....ry.acfs ora....fs.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node1
[[email protected] grid]# crsctl stat res -t
NAME TARGET STATE SERVER STATE_DETAILS
Local Resources
ora.DATA.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2
Cluster Resources
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node1
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.oc4j
1 ONLINE ONLINE node1
ora.scan1.vip
1 ONLINE ONLINE node1
查看磁盘组信息,如果没有挂载则手动挂载:
SQL> select name,state from v$asm_diskgroup;
4、添加资源(监听、数据库、实例等)
添加监听
[[email protected] ~]$ srvctl add listener -l listener
查看监听
[[email protected] ~]$ srvctl config listener
添加db和instance
[[email protected] ~]$ srvctl add database -h
[[email protected] ~]$ srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/db_1 -c RAC
[[email protected] ~]$ srvctl add instance -h
[[email protected] ~]$ srvctl add instance -d orcl -i orcl1 -n node1
[[email protected] ~]$ srvctl add instance -d orcl -i orcl2 -n node2
[[email protected] ~]$ srvctl config database -d orcl
5、资源添加完毕,重新启动集群
[[email protected] grid]# crsctl stop cluster -all
[[email protected] grid]# crsctl start cluster -all
添加完成后,可能出现数据库不能自动启动的问题。尝试执行以下语句:
[[email protected] ~]$ srvctl enable database -d orcl
[[email protected] ~]$ srvctl enable instance -d orcl -i orcl1
[[email protected] ~]$ srvctl enable instance -d orcl -i orcl2
[[email protected] ~]$ srvctl start database -d orcl
原文地址:http://blog.51cto.com/lyzbg/2090815