11G RAC 11.2.0.1.0实例evict故障处理 / 憋错料

Aix 7.1

参考文档：

https://blogs.oracle.com/database4cn/rac

Resolving ORA-481 and "terminating the instance due to error 481" (Doc ID 1950963.1)

ORA-00481 After "The instance eviction reason is 0x2" due to Lack of Ticket (Doc ID 1644015.1)

归根结底是RAC节点之间消息传输流量控制的问题，除网络、硬件原因外，需要打补丁解决。

以下为第一次处理前具体的alertlog。

◆node1
Wed May 24 01:59:24 2017
Remote instance 2 kill is issued with system inc 228
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x2
The instance eviction map is 2
Reconfiguration started (old inc 228, new inc 230)
List of instances:
1 (myinst: 1)
Wed May 24 01:59:27 2017
Trace dumping is performing id=[cdmp_20170524015904]
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed May 24 01:59:28 2017
LMS 1: 14 GCS shadows cancelled, 1 closed, 0 Xw survived
Wed May 24 01:59:28 2017
LMS 0: 23 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed May 24 01:59:28 2017
LMS 2: 21 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Wed May 24 01:59:32 2017
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Wed May 24 01:59:48 2017
parallel recovery started with 32 processes
Started redo scan
Wed May 24 01:59:49 2017
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Wed May 24 02:00:00 2017
Completed redo scan
read 2094379 KB redo, 237345 data blocks need recovery
Wed May 24 02:00:03 2017
Reconfiguration started (old inc 230, new inc 232)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed May 24 02:00:04 2017
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed May 24 02:00:04 2017
Wed May 24 02:00:04 2017
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Wed May 24 02:02:27 2017
，
，此处大量12170
，
Wed May 24 02:05:34 2017
LMON (ospid: 35979900): terminating the instance due to error 481
Wed May 24 02:05:34 2017
System state dump is made for local instance
System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_40894656.trc
Instance terminated by LMON, pid = 35979900
Wed May 24 08:12:27 2017
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect
Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl1.ora
System parameters with non-default values:
processes = 2000
timed_statistics = TRUE
sga_max_size = 70G
spfile = "+DATA1/orcl/spfileorcl.ora"
sga_target = 50G
control_files = "+DATA1/orcl/controlfile/current.261.832429113"
control_files = "+ARCDG/orcl/controlfile/current.259.832429113"
db_block_size = 8192
compatible = "11.2.0.0.0"
log_archive_format = "%t_%s_%r.dbf"
cluster_database = TRUE
db_create_file_dest = "+DATA1"
db_recovery_file_dest = "+ARCDG"
db_recovery_file_dest_size= 2048G
thread = 1
undo_tablespace = "UNDOTBS1"
undo_retention = 10800
instance_number = 1
remote_login_passwordfile= "NONE"
db_domain = ""
service_names = "orcl"
dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.4)(PORT = 1521))"
remote_listener = "rac-scan:1521"
result_cache_max_size = 268736K
audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"
audit_trail = "DB"
db_name = "orcl"
open_cursors = 1000
sql_trace = FALSE
optimizer_index_caching = 90
pga_aggregate_target = 20G
deferred_segment_creation= FALSE
aq_tm_processes = 5
diagnostic_dest = "/u01/app/11.2.0/oracle"
Deprecated system parameters with specified values:
sql_trace
End of deprecated system parameter listing
Cluster communication is configured to use the following interface(s) for this instance
192.168.0.1
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Wed May 24 08:12:32 2017
PMON started with pid=2, OS id=35652038
Wed May 24 08:12:32 2017
VKTM started with pid=3, OS id=33489120 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Wed May 24 08:12:32 2017
GEN0 started with pid=4, OS id=34209898
Wed May 24 08:12:32 2017
DIAG started with pid=5, OS id=36438538
Wed May 24 08:12:33 2017
DBRM started with pid=6, OS id=24576446
Wed May 24 08:12:33 2017
PING started with pid=7, OS id=29687920
Wed May 24 08:12:33 2017
PSP0 started with pid=8, OS id=36635296
Wed May 24 08:12:33 2017
ACMS started with pid=9, OS id=34341462
Wed May 24 08:12:33 2017
DIA0 started with pid=10, OS id=32047870
Wed May 24 08:12:33 2017
LMON started with pid=11, OS id=36045304
Wed May 24 08:12:35 2017
LMD0 started with pid=12, OS id=31391860
Wed May 24 08:12:35 2017
LMS0 started with pid=13, OS id=35979918 at elevated priority
Wed May 24 08:12:35 2017
LMS1 started with pid=14, OS id=22741294 at elevated priority
Wed May 24 08:12:36 2017
LMS2 started with pid=15, OS id=29229354 at elevated priority
Wed May 24 08:12:36 2017
RMS0 started with pid=16, OS id=20185780
Wed May 24 08:12:36 2017
LMHB started with pid=17, OS id=29425998
Wed May 24 08:12:36 2017
MMAN started with pid=18, OS id=9699954
Wed May 24 08:12:36 2017
DBW0 started with pid=19, OS id=31850686
Wed May 24 08:12:36 2017
DBW1 started with pid=20, OS id=36045442
Wed May 24 08:12:36 2017
DBW2 started with pid=21, OS id=40894536
Wed May 24 08:12:36 2017
DBW3 started with pid=22, OS id=29819564
Wed May 24 08:12:36 2017
DBW4 started with pid=23, OS id=36634890
Wed May 24 08:12:36 2017
LGWR started with pid=24, OS id=35783370
Wed May 24 08:12:37 2017
CKPT started with pid=25, OS id=35914258
Wed May 24 08:12:37 2017
SMON started with pid=26, OS id=32374796
Wed May 24 08:12:37 2017
RECO started with pid=27, OS id=35652280
Wed May 24 08:12:37 2017
RBAL started with pid=28, OS id=36438322
Wed May 24 08:12:37 2017
ASMB started with pid=29, OS id=39387144
Wed May 24 08:12:37 2017
MMON started with pid=30, OS id=2621900
NOTE: initiating MARK startup
Wed May 24 08:12:37 2017
Starting background process MARKMMNL started with pid=31, OS id=35979732

starting up 1 dispatcher(s) for network address ‘(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))‘...
Wed May 24 08:12:37 2017
MARK started with pid=32, OS id=35717688
NOTE: MARK has subscribed
starting up 1 shared server(s) ...
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 236)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 2
* domain 0 valid = 1 according to instance 2
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed May 24 08:12:45 2017
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Wed May 24 08:12:52 2017
LCK0 started with pid=34, OS id=24379778
Wed May 24 08:12:52 2017
Starting background process RSMN
Wed May 24 08:12:52 2017
RSMN started with pid=36, OS id=30802026
ORACLE_BASE from environment = /u01/app/11.2.0/oracle
Wed May 24 08:12:53 2017
ALTER DATABASE MOUNT
Wed May 24 08:12:53 2017
NOTE: Loaded library: System
Wed May 24 08:12:53 2017
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup ARCDG was mounted
Wed May 24 08:12:53 2017
NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established
Wed May 24 08:12:57 2017
Successful mount of redo thread 1, with mount id 1472569957
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
Wed May 24 08:12:58 2017
ALTER DATABASE OPEN
Block change tracking file is current.
Picked broadcast on commit scheme to generate SCNs
Wed May 24 08:12:58 2017
SUCCESS: diskgroup DATA2 was mounted
NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established
SUCCESS: diskgroup DATA3 was mounted
NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established
Thread 1 advanced to log sequence 255944 (thread open)
Thread 1 opened at log sequence 255944
Current log# 2 seq# 255944 mem# 0: +ARCDG/orcl/onlinelog/group_2.261.840376035
Current log# 2 seq# 255944 mem# 1: +ARCDG/orcl/onlinelog/group_2.2113.840376039
Current log# 2 seq# 255944 mem# 2: +ARCDG/orcl/onlinelog/group_2.2114.840376041
Current log# 2 seq# 255944 mem# 3: +ARCDG/orcl/onlinelog/group_2.2116.840376043
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Starting background process CTWR
Wed May 24 08:12:59 2017
CTWR started with pid=39, OS id=32768642
Block change tracking service is active.
Wed May 24 08:13:00 2017
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
Starting background process GTX0
Wed May 24 08:13:07 2017
GTX0 started with pid=42, OS id=35062526
Starting background process RCBG
Wed May 24 08:13:07 2017
RCBG started with pid=43, OS id=27721944
replication_dependency_tracking turned off (no async multimaster replication found)
Wed May 24 08:13:08 2017
Starting background process QMNC
Wed May 24 08:13:08 2017
QMNC started with pid=44, OS id=34013894
Completed: ALTER DATABASE OPEN
Wed May 24 08:13:17 2017
Starting background process CJQ0
Wed May 24 08:13:17 2017
CJQ0 started with pid=41, OS id=11469004
Wed May 24 08:13:43 2017
Starting background process SMCO
Wed May 24 08:13:43 2017
SMCO started with pid=120, OS id=42336480
，
，
，
Wed May 24 09:08:16 2017
Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j001_33161436.trc:
ORA-12012: 鑎仟錨狮鎊骚琛^啼絕茕竈239 錨呛閊証
ORA-00001: 杩^蒎^蟐湾^辕涓^犁害鎊較哝^荨浠(CM.UK_LTE_ZY_CHECK_HISTORY)
ORA-06512: 錨塄 "CM.JOB_LTE_ZY_CHECK", line 27
ORA-06512: 錨塄 line 1

◆node2
Wed May 24 01:59:01 2017
LMS1 (ospid: 26411086) received an instance eviction notification from instance 1 [2]
Wed May 24 01:59:01 2017
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x2
The instance eviction map is 2
Wed May 24 01:59:04 2017
PMON (ospid: 50725354): terminating the instance due to error 481
Wed May 24 01:59:04 2017
System state dump is made for local instance
System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_42336686.trc
Wed May 24 01:59:04 2017
ORA-1092 : opitsk aborting process
Wed May 24 01:59:04 2017
License high water mark = 516
Instance terminated by PMON, pid = 50725354
USER (ospid: 50987430): terminating the instance
Instance terminated by USER, pid = 50987430
Wed May 24 01:59:29 2017
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect
Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl2.ora
System parameters with non-default values:
processes = 2000
timed_statistics = TRUE
sga_max_size = 70G
spfile = "+DATA1/orcl/spfileorcl.ora"
sga_target = 50G
control_files = "+DATA1/orcl/controlfile/current.261.832429113"
control_files = "+ARCDG/orcl/controlfile/current.259.832429113"
db_block_size = 8192
compatible = "11.2.0.0.0"
log_archive_format = "%t_%s_%r.dbf"
cluster_database = TRUE
db_create_file_dest = "+DATA1"
db_recovery_file_dest = "+ARCDG"
db_recovery_file_dest_size= 2048G
thread = 2
undo_tablespace = "UNDOTBS2"
undo_retention = 10800
instance_number = 2
remote_login_passwordfile= "NONE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.5)(PORT = 1521))"
remote_listener = "rac-scan:1521"
result_cache_max_size = 268736K
audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"
audit_trail = "DB"
db_name = "orcl"
open_cursors = 1000
sql_trace = FALSE
optimizer_index_caching = 90
pga_aggregate_target = 20G
deferred_segment_creation= FALSE
aq_tm_processes = 5
diagnostic_dest = "/u01/app/11.2.0/oracle"
Deprecated system parameters with specified values:
sql_trace
End of deprecated system parameter listing
Cluster communication is configured to use the following interface(s) for this instance
192.168.0.2
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Wed May 24 01:59:34 2017
PMON started with pid=2, OS id=66584656
Wed May 24 01:59:34 2017
VKTM started with pid=3, OS id=66846744 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Wed May 24 01:59:34 2017
GEN0 started with pid=4, OS id=26608090
Wed May 24 01:59:34 2017
DIAG started with pid=5, OS id=26083716
Wed May 24 01:59:34 2017
DBRM started with pid=6, OS id=24510972
Wed May 24 01:59:34 2017
PING started with pid=7, OS id=65077306
Wed May 24 01:59:34 2017
PSP0 started with pid=8, OS id=66781402
Wed May 24 01:59:34 2017
ACMS started with pid=9, OS id=66978040
Wed May 24 01:59:34 2017
DIA0 started with pid=10, OS id=66519050
Wed May 24 01:59:34 2017
LMON started with pid=11, OS id=66453694
Wed May 24 01:59:37 2017
LMD0 started with pid=12, OS id=23658758
Wed May 24 01:59:37 2017
LMS0 started with pid=13, OS id=66322614 at elevated priority
Wed May 24 01:59:37 2017
LMS1 started with pid=14, OS id=65798210 at elevated priority
Wed May 24 01:59:37 2017
LMS2 started with pid=15, OS id=15204812 at elevated priority
Wed May 24 01:59:38 2017
RMS0 started with pid=16, OS id=65732820
Wed May 24 01:59:38 2017
LMHB started with pid=17, OS id=65339438
Wed May 24 01:59:38 2017
MMAN started with pid=18, OS id=52298036
Wed May 24 01:59:38 2017
DBW0 started with pid=19, OS id=52232458
Wed May 24 01:59:38 2017
DBW1 started with pid=20, OS id=65273872
Wed May 24 01:59:38 2017
DBW2 started with pid=21, OS id=65208354
Wed May 24 01:59:38 2017
DBW3 started with pid=22, OS id=65994980
Wed May 24 01:59:38 2017
DBW4 started with pid=23, OS id=61210752
Wed May 24 01:59:38 2017
LGWR started with pid=24, OS id=6095298
Wed May 24 01:59:38 2017
CKPT started with pid=25, OS id=65470648
Wed May 24 01:59:38 2017
SMON started with pid=26, OS id=64946374
Wed May 24 01:59:38 2017
RECO started with pid=27, OS id=51642854
Wed May 24 01:59:39 2017
RBAL started with pid=28, OS id=64618728
Wed May 24 01:59:39 2017
ASMB started with pid=29, OS id=64553208
Wed May 24 01:59:39 2017
MMON started with pid=30, OS id=64422054
NOTE: initiating MARK startup
Wed May 24 01:59:39 2017
MMNL started with pid=31, OS id=42336716
Starting background process MARK
starting up 1 dispatcher(s) for network address ‘(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))‘...
Wed May 24 01:59:39 2017
MARK started with pid=32, OS id=51577168
NOTE: MARK has subscribed
Wed May 24 01:59:39 2017
starting up 1 shared server(s) ...
lmon registered with NM - instance number 2 (internal mem no 1)
Reconfiguration started (old inc 0, new inc 232)
List of instances:
1 2 (myinst: 2)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid = 0 according to instance 1
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed May 24 01:59:48 2017
Submitted all GCS remote-cache requests
Wed May 24 02:05:10 2017
Trace dumping is performing id=[cdmp_20170524020534]
Wed May 24 02:05:18 2017
Reconfiguration started (old inc 232, new inc 234)
List of instances:
2 (myinst: 2)
Nested reconfiguration detected.
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed May 24 02:05:19 2017
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed May 24 02:05:19 2017
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed May 24 02:05:19 2017
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Wed May 24 02:05:25 2017
LCK0 started with pid=34, OS id=51118396
Wed May 24 02:05:25 2017
Starting background process RSMN
Wed May 24 02:05:25 2017
RSMN started with pid=36, OS id=51773786
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /u01/app/11.2.0/oracle
Wed May 24 02:05:25 2017
ALTER DATABASE MOUNT
This instance was first to mount
Wed May 24 02:05:26 2017
NOTE: Loaded library: System
Wed May 24 02:05:26 2017
SUCCESS: diskgroup DATA1 was mounted
Wed May 24 02:05:26 2017
NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established
SUCCESS: diskgroup ARCDG was mounted
NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established
Wed May 24 02:05:30 2017
Successful mount of redo thread 2, with mount id 1472569957
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
ALTER DATABASE OPEN
This instance was first to open
Wed May 24 02:05:30 2017
SUCCESS: diskgroup DATA2 was mounted
NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established
SUCCESS: diskgroup DATA3 was mounted
NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established
Block change tracking file is current.
Beginning crash recovery of 2 threads
parallel recovery started with 32 processes
Started redo scan
Wed May 24 02:05:44 2017
Completed redo scan
read 2158203 KB redo, 230150 data blocks need recovery
Started redo application at
Thread 1: logseq 255942, block 163388
Thread 2: logseq 236286, block 2067920
Recovery of Online Redo Log: Thread 1 Group 3 Seq 255942 Reading mem 0
Mem# 0: +ARCDG/orcl/onlinelog/group_3.2085.840375427
Mem# 1: +ARCDG/orcl/onlinelog/group_3.2084.840375445
Mem# 2: +ARCDG/orcl/onlinelog/group_3.263.840375447
Mem# 3: +ARCDG/orcl/onlinelog/group_3.2089.840375449
Recovery of Online Redo Log: Thread 2 Group 7 Seq 236286 Reading mem 0
Mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
Mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
Mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
Mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597
Recovery of Online Redo Log: Thread 2 Group 5 Seq 236287 Reading mem 0
Mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445
Mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483
Mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489
Mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493
Wed May 24 02:06:05 2017
Completed redo application of 1702.07MB
Completed crash recovery at
Thread 1: logseq 255942, block 291036, scn 12495715727030
Thread 2: logseq 236287, block 2742038, scn 12495715600520
230150 data blocks read, 229989 data blocks written, 2158203 redo k-bytes read
Thread 1 advanced to log sequence 255943 (thread recovery)
Picked broadcast on commit scheme to generate SCNs
Wed May 24 02:06:06 2017
Thread 2 advanced to log sequence 236288 (thread open)
Thread 2 opened at log sequence 236288
Current log# 6 seq# 236288 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575
Current log# 6 seq# 236288 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579
Current log# 6 seq# 236288 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581
Current log# 6 seq# 236288 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585
Successful open of redo thread 2
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Starting background process CTWR
Wed May 24 02:06:07 2017
CTWR started with pid=89, OS id=49349114
Block change tracking service is active.
Wed May 24 02:06:07 2017
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 4.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
Starting background process GTX0
Wed May 24 02:06:11 2017
GTX0 started with pid=98, OS id=42992042
Starting background process RCBG
Wed May 24 02:06:11 2017
RCBG started with pid=52, OS id=40370648
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed May 24 02:06:13 2017
QMNC started with pid=119, OS id=47514086
Completed: ALTER DATABASE OPEN
SMON: Parallel transaction recovery tried
Starting background process SMCO
Wed May 24 02:06:17 2017
SMCO started with pid=153, OS id=50724962
Wed May 24 02:06:17 2017
db_recovery_file_dest_size of 2097152 MB is 4.35% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed May 24 02:06:19 2017
Starting background process CJQ0
Wed May 24 02:06:19 2017
CJQ0 started with pid=46, OS id=43581760
Wed May 24 02:07:35 2017
Thread 2 advanced to log sequence 236289 (LGWR switch)
Current log# 7 seq# 236289 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
Current log# 7 seq# 236289 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
Current log# 7 seq# 236289 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
Current log# 7 seq# 236289 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597
Wed May 24 02:08:09 2017
Thread 2 advanced to log sequence 236290 (LGWR switch)
Current log# 5 seq# 236290 mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445
Current log# 5 seq# 236290 mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483
Current log# 5 seq# 236290 mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489
Current log# 5 seq# 236290 mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493
Wed May 24 02:08:51 2017
Thread 2 advanced to log sequence 236291 (LGWR switch)
Current log# 6 seq# 236291 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575
Current log# 6 seq# 236291 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579
Current log# 6 seq# 236291 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581
Current log# 6 seq# 236291 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585
Wed May 24 02:09:30 2017
Thread 2 advanced to log sequence 236292 (LGWR switch)
Current log# 7 seq# 236292 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
Current log# 7 seq# 236292 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
Current log# 7 seq# 236292 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
Current log# 7 seq# 236292 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597

针对这个RAC集群，做了加多的配置修正、优化，主要是如下所述。

◆数据库配置优化

0、前期大量的日志切换无法完成，导致数据库挂起，为此新增了loggroup，并删掉无用多余的logmember。

1、增大了ASM实例的内存，由350M--->2G

2、RAC并行进程强制在本地实例分配

3、为减少对性能的影响，关闭audit审计

4、AWR 资料保存时间由7天增加到30天

5、SGA增大10G至60G

◆修改网络相关的系统核心参数

/usr/sbin/no -p -o tcp_sendspace=4194304

/usr/sbin/no -p -o tcp_recvspace=4194304

/usr/sbin/no -p -o rfc1323=1

/usr/sbin/no -p -o sb_max=8388608

/usr/sbin/no -p -o udp_ephemeral_low=9000

/usr/sbin/no -p -o tcp_ephemeral_low=9000

◆修改内联网卡的参数

chdev -l en1 -a tcp_sendspace=1048576 -a rfc1323=1 -a tcp_recvspace=1048576

◆SWAP

物理内存大于 16GB，但 SWAP 空间低于 16 GB，不符合 Oracle 安装的标准建议。增大PS。chps -s 192 hd6

◆时区及时间确认，安全起见，暂不修改。

◆ntp确认，使用GI集群自身的时间同步，确保各个节点的时间同步，目前是几十秒的偏差，不是问题。

◆HA集群配置修改，当初安装集群时部分内容没有按照官方文档实施，本次做了补救，但不全面，可能仍存在未知的风险。

5月25日重启后，经多一段时间的观察，有所改善，但于6月7日9点又发生一次节点2当。

但此次已经不是节点的驱逐，说明上次的修正与配置优化起到了作用，初步断定与网络参数配置修改相关。日志如下，无evict日志。

Wed Jun 07 09:02:57 2017
Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc  (incident=1023882):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/incident/incdir_1023882/orcl2_lms0_7930030_i1023882.trc
Wed Jun 07 09:02:59 2017
Trace dumping is performing id=[cdmp_20170607090259]
Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc:
ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []
LMS0 (ospid: 7930030): terminating the instance due to error 484
Instance terminated by LMS0, pid = 7930030
Wed Jun 07 09:14:01 2017
Starting ORACLE instance (normal)

针对此次的600内部错误，初步结论：

5月25日优化重启后，6月7日又发生过1次节点down的情况，经排查，基本确认是oracle BUG，在某些情况下会发生节点down。目前数据库PSU版本11.2.0.1.0，需要打补丁到最新版本11.2.0.1.6。后续需择机打补丁。

时间： 2024-10-20 14:38:38

11G RAC 11.2.0.1.0实例evict故障处理

11G RAC 11.2.0.1.0实例evict故障处理的相关文章

Oracle 11g RAC自动打GI PSU补丁(11.2.0.4.8)

rhel7.4安装oracle 11G 11.2.0.4.0 RAC

Oracle Database 11g Release 2(11.2.0.3.0) RAC On Redhat Linux 5.8 Using Vmware Workstation 9.0

Oracle Linux 6.3下安装Oracle 11g R2(11.2.0.3)

Oracle Study之--Oracle 单实例11.2.0.1.0升级到11.2.0.3.0

ORACLE RAC 11.2.0.4 CentOS release 6.9 静默安装1.0版本

?完整记录一则Oracle 11.2.0.4单实例打PSU补丁的过程

oracle 11.2.0.4单实例文件系统安装与补丁

11g RAC完全安装图解(VM Workstation8.0+Centos5.5)01-Grid安装