aix7.1安装12c执行root.sh无法启动ohasd服务案例

一、安装环境

操作系统版本：IBM AIX 7100-03-05-1524

ORACLE版本：ORACLE DATABASE 12C 12.1.0.2.0-64BIT RAC

二、安装报错

该环境安装的为12c RAC，在节点一执行root.sh脚本时，出现如下信息：

2018/03/29 17:11:43 CLSRSC-330: ADDing clusterware entries to file '/etc/inittab'
2018/03/29 17:13:49 CLSRSC-214: Failed to start the resource 'ohasd'
Failed to start the Clusterware. Last 20 lines of the alert log follow:  --再无信息输出

ohasd服务无法启动，查看$ORACLE_HOME/cfgtoollogs/oui/目录下的安装日志，并没有发现任何有用的报错信息，日志信息如下：

2018-03-29 17:11:48: Done updating /etc/inittab.tmp
2018-03-29 17:11:48: Saved /etc/inittab.crs
2018-03-29 17:11:48: Installed new /etc/inittab
2018-03-29 17:11:48: Executing /user/sbin/init g
2018-03-29 17:11:48: Executing cmd: /user/sbin/init g
2018-03-29 17:11:48: Executing cmd: /oracle/app/12.1.0/grid/bin/crsctl start has
2018-03-29 17:13:49: Command output:
> CRS-4124: Oracle High Availability Services startup failed. 
> CRS-4000: Command Start failed, or completed with errors.
>End Command output
2018-03-29 17:13:49: Executing  /etc/ohasd install
2018-03-29 17:13:49: Executing cmd: /etc/ohasd install
2018-03-29 17:13:49: Executing cmd: /oracle/app/12.1.0/grid/bin/clsecho -p has -f clsrsc -m 214
2018-03-29 17:13:49: Command output:
> CLSRSC-214: Failed to start the resource 'ohasd'
>End Command output
2018-03-29 17:13:49: Executing cmd: /oracle/app/12.1.0/grid/bin/clsecho -p has -f clsrsc -m 214
2018-03-29 17:13:49: Command output:
> CLSRSC-214: Failed to start the resource 'ohasd'
>End Command output
2018-03-29 17:13:49: CLSRSC-214: Failed to start the resource 'ohasd'
2018-03-29 17:13:49: ohasd failed to start
2018-03-29 17:13:49: Alert log is /oracle/app/12.1.0/grid/log/node1/alertnode1.log
2018-03-29 17:13:49: Failed to start service 'ohasd'
2018-03-29 17:13:49: Checking the status of ohasd
2018-03-29 17:13:49: Configured CRS Home: /oracle/app/12.1.0/grid
2018-03-29 17:13:49: Eexcuting cmd: /oracle/app/12.1.0/grid/bin/crsctl check has
2018-03-29 17:13:49: Checking the status of ohasd
2018-03-29 17:13:49: Eexcuting cmd: /oracle/app/12.1.0/grid/bin/crsctl check has
2018-03-29 17:13:49: Checking the status of ohasd

再查看告警日志/oracle/app/12.1.0/grid/log/node1/alertnode1.log无任何的信息输出，再接着查看/u/app/11.2.0/grid/log/node1/ohasd/ohasd.log等其它日志，也都无信息输出。

尝试手动拉起ohasd进程，报错，无法拉起：

[[email protected] bin]# ps -ef|grep d.bin

root 1245784 1 0 21:33:04 - 0:00 /oracle/app/12.1.0/grid/bin/ohasd.bin reboot

root 1311392 16394110 0 21:37:31 pts/2 0:00 grep d.bin

[[email protected] bin]# ./oracle/app/12.1.0/grid/bin/crsctl start has

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.

尝试通过脚本roothas.pl取消root.sh脚本所注册的信息：

[[email protected] bin]# /oracle/app/12.1.0/grid/grid/crs/install/rootcrs.pl -deconfig -verbose -force

再执行root.sh脚本，依旧无法通过：

[[email protected] bin]# /oracle/app/12.1.0/grid/root.sh

三、报错分析

因为各日志都无有效的报错信息输出，可以说是不报错，于是猜想如下几种原因：

1、 oracle安装包在下载和解压缩过程中存在损坏或丢失文件？

l 通过对比官网文件大小，和解压缩过程的输出记录，该情况的可能性较小。

2、12.1.0.2未在该操作系统（IBM AIX on POWER Systems (64-bit) 7.1）上未认证？

l 例如：11.2.0.2未认证RedHat 6.x系统，11.2.0.3未认证RedHat 7.x系统，11g及之前的版本可通过metelink文档（ID 169706.1）查询认证的操作系统版本，12c可通过metelink文档（ID 587357.1 / ID 2226599.1中文）查询，Oracle Database 12.1.0.2.0 已在 IBM AIX on POWER Systems (64-bit) 7.1 上已通过认证。

3、忽略了安装检查中的安装要求选项，如系统包，内核参数配置？

l 在安装过程中，忽略了一些非必要的安装要求项，于是对系统参数设置和系统包的安装再核查了一遍，并未发现问题，而且如果是因为这个原因，应该是有报错信息输出的，所以该情况的可能性也是极小的。

4、操作系统底层进程或设置阻塞ohasd服务的运行？

l 我们都知道在oracle数据库里，经常出现会话阻塞的情况，一般都是找出阻塞会话的源头进行处理，在操作系统层，也会出现进程间通信互相阻塞的情况，基于这个，该种情况的可能性是最大的。

四、猜想验证

既然很大可能是OS层进程间通信互相阻塞导致的，那么有什么办法去验证呢?

OS上truss是非常有用的工具，通过OS系统级别跟踪一个进程的系统调用或信号产生的情况，能快速定位并解决问题。

通过truss追踪runcluvfy.sh检查安装过程的系统进程调用情况：

truss –o truss_runcluvfy.out ./runcluvfy.sh stage -pre crsinst -n node1,node2 -fixup -verbose

2163566:psargs:/bin/sh ./runcluvfy.sh stage -pre crsinst -n node1, node2 -fixup -verbose
Thu Mar 29 21:24:36 2018
2163566: 4588341:0.0000: kwaitpid(0x00000000, 0, 0, 0x00000000, 0x00000000) (sleeping...) 
#以下为关键ohasd进程阻塞等待信息
1245784: 3998277: 0.0003:  _nsleep(0x0FFFFFFFFFFFF540, 0x0FFFFFFFFFFFF610) = 0
1245784: 3998277: 1.0003: kopen("/tmp/.oracle/opohasd", O+WRONLY|O_NONBLOCK) Err#6 ENXIO
1245784: 3998277: 0.0003:  _nsleep(0x0FFFFFFFFFFFF540, 0x0FFFFFFFFFFFF610) = 0
1245784: 3998277: 1.0003: kopen("/tmp/.oracle/opohasd", O+WRONLY|O_NONBLOCK) Err#6 ENXIO
1245784: 3998277: 0.0004:  _nsleep(0x0FFFFFFFFFFFF540, 0x0FFFFFFFFFFFF610) = 0
1245784: 3998277: 1.0004: kopen("/tmp/.oracle/opohasd", O+WRONLY|O_NONBLOCK) Err#6 ENXIO

根据truss追踪到信息，然后在metelink上查询相关信息，终于找到如下对应的文档：OHASD FAILED TO START: A SPECIFIED FILE DOES NOT SUPPORT THE IOCTL SYSTEM CALL (文档 ID 1537338.1)

然后查询节点一和节点二/etc/inittab内容，果然存在该信息：

[[email protected] bin]# grep install /etc/inittab

install_assist:2:wait:/usr/sbin/install_assist </dev/console >/dev/console 2>&1

install_assist是系统的安装助手，是交互式工具，即假如没有响应，则会一直等待，那么在该行后面的命令将不会被执行，也就是说rc2.d（默认运行级别为2）下的服务将不会被启动，这也就是无法启动ohasd服务的真凶。

五、问题解决

将/etc/inittab里面的install_assist的一行注释掉或清理掉，重启系统，然后重新执行root.sh，数据库顺利安装。

[[email protected] bin]# grep install /etc/inittab

#install_assist:2:wait:/usr/sbin/install_assist </dev/console >/dev/console 2>&1

六、总结

因为操作系统是紧急安装上线的（数据库也要求紧急安装上线），安装系统完成之后没有把 install_assist 禁用，ohasd进程在这进程的后面，所以一直在等待，没有启动。该案例可作为参考，在AIX环境安装数据库时，先检查/etc/inittab文件内容，先将install_assist 禁用。

原文地址：http://blog.51cto.com/wyzwl/2104189

时间： 2024-09-28 06:29:50

aix7.1安装12c执行root.sh无法启动ohasd服务案例

aix7.1安装12c执行root.sh无法启动ohasd服务案例的相关文章

安装11.2.0.3 RAC grid 执行root.sh报错

记几个rac安装执行root.sh的报错

Oracle 12C RAC安装grid时root.sh报错ORA-00845

执行root.sh报错:CLSRSC-196: ACFS driver install action

Oracle 11g RAC 二节点root.sh执行报错故障一例

RAC安装重新运行root.sh

当root.sh与ORA-15031相遇

ALERT: root.sh Fails With "CLSRSC-400"

RHEL7.X安装12.2RAC时root.sh错误CLSRSC-400的解决方案