在AIX 7100-02-03-1334 上安装Oracle Rac,grid和oracle都已安装完成。但是dbca建库的时候发现数据库crash,以下是建库时的alert.log,数据库报ora-07445报错,dbca的日志中可以发现在Create database时出错。
在mos上没有找到匹配的文档,尝试使用其他方法。
/oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/alert_rmbtodb1.log
MMNL started with pid=26, OS id=7733452
Exception [type: SIGILL, Illegal opcode] [ADDR:0x103E2AFA0] [PC:0x103E2AFA0, {empty}] [flags: 0x0, count: 1]
Errors in file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/rmbtodb1_asmb_6357148.trc (incident=105793):
ORA-07445: exception encountered: core dump [PC:0x103E2AFA0] [SIGILL] [ADDR:0x103E2AFA0] [PC:0x103E2AFA0] [Illegal opcode] []
Incident details in: /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105793/rmbtodb1_asmb_6357148_i105793.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 2)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Thu Dec 11 11:19:18 2014
LCK0 started with pid=27, OS id=10420304
Starting background process RSMN
Thu Dec 11 11:19:18 2014
RSMN started with pid=28, OS id=9306256
ORACLE_BASE from environment = /oraapp/oracle
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x496568BB8] [PC:0x10029B4D0, {empty}] [flags: 0x8, count: 3]
Errors in file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/rmbtodb1_asmb_6357148.trc (incident=105794):
ORA-07445: exception encountered: core dump [PC:0x10029B4D0] [SIGSEGV] [ADDR:0x496568BB8] [PC:0x10029B4D0] [Address not mapped to object] []
ORA-07445: exception encountered: core dump [PC:0x103E2AFA0] [SIGILL] [ADDR:0x103E2AFA0] [PC:0x103E2AFA0] [Illegal opcode] []
Incident details in: /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Dec 11 11:19:21 2014
Sweep [inc][105794]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sweep [inc][105793]: completed
Sweep [inc2][93794]: completed
Sweep [inc2][105794]: completed
PMON (ospid: 16318602): terminating the instance due to error 486
System state dump requested by (instance=1, osid=16318602 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/rmbtodb1_diag_14352568.trc
Dumping diagnostic data in directory=[cdmp_20141211111922], requested by (instance=1, osid=16318602 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 16318602
[email protected]:/oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace>1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc <
"/oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc" 2832 lines, 161159 characters
Dump file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc
首先怀疑是oracle对ASM磁盘没有写权限,尝试用oracle在ASM上创建spfile,成功创建。检查CRS_HOME和ORACLE_HOME的执行文件oracle,并未发现权限问题。
1、首先尝试在1号节点上手动建库,编写一份pfile,尝试将数据库nomount,发现数据库nomount起来后立即crash。
2、尝试在2号节点上dbca建库,其中报错信息如下:
/oraapp/oracle/cfgtoollogs/dbca/rmbtodb/trace.log
[Thread-178] [ 2014-12-11 12:47:49.813 CST ] [PostDBCreationStep.executeImpl:889] Starting Database HA Resource
[Thread-178] [ 2014-12-11 12:48:16.318 CST ] [CRSNative.internalStartResource:389] Failed to start resource: Name: ora.rmbtodb.db, node: null, filter: null,
msg CRS-5017: The resource action "ora.rmbtodb.db start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 14287060
Session ID: 126 Serial number: 1
. For details refer to "(:CLSN00107:)" in"/oraapp/grid/gridhome/log/urmbtodb1/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of ‘ora.rmbtodb.db‘ on ‘urmbtodb1‘ failed
CRS-2632: There are no more servers to try to place resource ‘ora.rmbtodb.db‘ on that would satisfy its placement policy
[Thread-178] [ 2014-12-11 12:48:16.319 CST ] [PostDBCreationStep.executeImpl:897] Exception while Starting with HA Database Resource PRCR-1079 : Failed to s
tart resource ora.rmbtodb.db
CRS-5017: The resource action "ora.rmbtodb.db start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 14287060
Session ID: 126 Serial number: 1
. For details refer to "(:CLSN00107:)" in "/oraapp/grid/gridhome/log/urmbtodb1/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of ‘ora.rmbtodb.db‘ on ‘urmbtodb1‘ failed
CRS-2632: There are no more servers to try to place resource ‘ora.rmbtodb.db‘ on that would satisfy its placement policy
ora.rmbtodb.db在rmbtodb1上启动失败,但是数据库可以成功创建在2号节点上。
具体查看oraagent_oracle.log日志:
/oraapp/grid/gridhome/log/urmbtodb1/agent/crsd/oraagent_oracle/oraagent_oracle.log
2014-12-10 22:48:11.505: [ USRTHRD][1800] {2:52141:473} Value of LOCAL_LISTENER is
2014-12-10 22:48:11.549: [ USRTHRD][1800] {2:52141:473} ORA-01405: fetched column value is NULL
2014-12-10 22:48:11.549: [ USRTHRD][1800] {2:52141:473} Value of LISTENER_NETWORKS is
2014-12-10 22:48:11.549: [ USRTHRD][1800] {2:52141:473} sqlStmt = ALTER SYSTEM SET LOCAL_LISTENER=‘ (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=200.31.155.225)(PORT=1521))))‘ SCOPE=MEMORY SID=‘rmbtodb1‘ /* db agent *//*
{2:52141:473} */
2014-12-10 22:48:13.011: [ USRTHRD][1800] {2:52141:473} ORA-03113: end-of-file on communication channel
Process ID: 14287060
Session ID: 126 Serial number: 1
发现在设置LOCAL_LISTENER时,数据库crash。此时问题已经非常明显,肯定是网络方面的问题。
AIX管理员表示之前在1号节点上做过更改网卡绑定的模式。
[email protected]:/home/grid>oifcfg getif -global
en10 192.168.4.0 global cluster_interconnect
en9 200.31.155.0 global public
查看public IP和priv IP并无异常。尝试将Public IP重新设置一下:
删除en9信息:
[email protected]:/home/grid>oifcfg -delif -global en9
[email protected]:/home/grid>oifcfg getif -global
en10 192.168.4.0 global cluster_interconnect
重设public IP:
[email protected]:/home/grid>oifcfg -setif -global en9/200.31.155.0:public
[email protected]:/home/grid>oifcfg getif -global
en10 192.168.4.0 global cluster_interconnect
en9 200.31.155.0 global public
之后将crs重新启动。并再次在1号节点dbca建库,没有出现此前类似的问题。