原因:
在测试机上首次安装oracle11G
RAC,安装完成后正常使用,过了一段时间后重启节点1测试是否可以自启动,解决节点1没有自启动,手工启动也无法启动
过程:
在节点一上运行:
# pwd
/u01/grid/bin
# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
查看节点1日志
# pwd
/u01/grid/log/nodea/client
# cat crsctl_grid.log
Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright
1996, 2011 Oracle. All rights reserved.
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
2014-04-09 22:45:20.882: [ CRSCTL][1]File
/u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not
modified, OCR key was empty
[ CLWAL][1]clsw_Initialize: OLR initlevel [30000]
2014-04-17 07:27:27.517: [ CRSCTL][1]File
/u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not
modified, OCR key was empty
2014-04-19 02:24:13.609: [ CRSCTL][1]File
/u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not
modified, OCR key was empty
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: failuring during
clsaauthmsg ret clsaretOSD (8), endp 1110bdd70 [0000000000000018] { gipcEndpoint
: localAddr
‘clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=32b4238c-0bc8efcf-12779694))‘,
remoteAddr
‘clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodea_)(GIPCID=0bc8efcf-32b4238c-7078108))‘,
numPend 5, numReady 0, numDone 2, numDead 0, numTransfer 0, objFlags 0x0,
pidPeer 7078108, flags 0x2ca712, usrFlags 0x34000 }
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos op :
write
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos dep : No
space left on device (28)
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos loc :
authrespset5
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos info: len
-1 != expected 4
2014-04-30 02:19:51.493: [ CSSCLNT][1]clssscConnect: gipc request failed with
22 (12)
2014-04-30 02:19:51.493: [ CSSCLNT][1]clsssInitNative: connect to
(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodea_)) failed, rc 22
发现关键问题:
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos dep : No
space left on device (28)
查看节点1磁盘空间,发现确实没有空间了
# df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 0.25 0.05 79% 10134 44% /
/dev/hd2 2.06 0.13 94% 44051 57% /usr
/dev/hd9var 0.44 0.15 67% 6196 15% /var
/dev/hd3 10.00 2.08 80% 4367 1% /tmp
/dev/hd1 0.06 0.00 100% 73 46% /home
/dev/hd11admin 0.12 0.12 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 0.38 0.18 51% 7044 14% /opt
/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump
/dev/fslv00 30.00 0.00 100% 54756 90% /u01
怀疑是数据库一直报警导致日志增大将空间占满了,进入oracle数据库告警日志
$ pwd
/u01/base/diag/rdbms/test/test1/trace
$ du -sg /u01/base/diag/rdbms/test/test1/trace
- /u01/base/diag/rdbms/test/test1/trace
删除所有告警日志,因为是测试库,所以不去查到底是什么原因导致数据库一直报警。节点2服务器磁盘空间没有占满。
重新使用root用户启动crs,提示crs已经启动,但是使用crs_stat没有查到进场,原因回来再查询吧
# id
uid=0(root) gid=0(system)
groups=2(bin),3(sys),7(security),8(cron),10(audit),11(lp)
# pwd
/u01/grid/bin
# ./crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
oracle11.2.0.3.0 RAC aix7100-02-02-1316
crs-4124,crs-4000错误问题解决