1.什么是ONS
ONS(Oracle Notification Service)是Oracle Clusterware 实现FAN Event Push模型的基础。
在传统模型中,客户端需要定期检索服务器来判断服务端的状态,本质上是一个PULL模型。ORACLE10
引入了一种全新的PUSH机制--FAN(Fast Application Notification),当服务端发生某些事件时,服务器
会主动的通知客户端这种变化,这样客户端就能尽早得知服务器端变化。而这种机制就是依赖ONS实现的。
通常使用onsctl命令管理配置ONS,使用onsctl命令之前,需要先配置ONS服务。
2.OSN配置内容
需要注意的是在RAC环境中,使用的是$CRS_HOME下的ONS,而不是$ORACLE_HOME下的ONS。
配置文件位于$CRS_HOME/opmn/conf/ons.config。
[[email protected] conf]# pwd /opt/ora10g/product/10.2.0/crs_1/opmn/conf [[email protected] conf]# ls ons.config [[email protected] conf]# cat ons.config localport=6100 remoteport=6200 loglevel=3 useocr=on
我们对这个文件的参数进行说明:
<1>localport:这个参数代表本地监听端口,这里的"本地"特指127.0.0.1这个回环地址,用来和运行在本地的客户端进行通信。
<2>remoteport:这个参数代表的远程监听端口,也就是除了127.0.0.1以外的所有本机IP地址,用来和远程的客户端进行通信。
<3>loglevel:Oracle允许跟踪ONS进程的运行,并把日志记录到本地文件中。这个参数用来定义ONS进程要记录的日志级别, 从1~9,缺省值为3。
<4>logfile:这个参数和loglevel参数一起使用,用于定义ONS进程日志文件的位置,缺省是 $CRS_HOME/opmn/logs/opmn.log。
<5>nodes和useocr:这两个参数共同决定了本机的ONS daemon要和哪些节点上的ONS daemon进行通信。
在这些参数中,localport和remoteport两个参数是必须的。可以通过netstat命令来比较一下这两个端口的使用方式:
[[email protected] bin]# netstat -ano|grep 6100 tcp 0 0 127.0.0.1:6100 0.0.0.0:* LISTEN off (0.00/0/0) tcp 0 0 127.0.0.1:6100 127.0.0.1:32852 ESTABLISHED off (0.00/0/0) tcp 0 0 127.0.0.1:32840 127.0.0.1:6100 ESTABLISHED keepalive (7063.32/0/0) tcp 0 0 127.0.0.1:32852 127.0.0.1:6100 ESTABLISHED keepalive (7188.42/0/0) tcp 0 0 127.0.0.1:6100 127.0.0.1:32840 ESTABLISHED off (0.00/0/0) udp 0 0 192.168.2.103:61008 0.0.0.0:* off (0.00/0/0)0/0) [[email protected] bin]# netstat -ano|grep 6200 tcp 0 0 0.0.0.0:6200 0.0.0.0:* LISTEN off (0.00/0/0) tcp 0 0 192.168.1.103:32836 192.168.1.104:6200 ESTABLISHED off (0.00/0/0)
对比可以看到Oracle在127.0.0.1这个地址上监听6100这个端口,而在0.0.0.0(即所其他地址)上监听6200端口,这正好对应了我们/opt/ora10g/product/10.2.0/crs_1/opmn/conf/ons.config中的配置
在这里还需要注意的是useocr参数,该参数取值为ON或OFF。如果useocr是ON,说明与ONS进行通信的远程节点信息就保存在OCR中,如果是OFF,说明与ONS进行通信的远程节点信息就取nodes中的配置。
nodes参数值格式: hostname/ip:port[,hostname/ip:port] 例如:nodes=dbs:6200,dbp:6200
当useocr参数为ON时,与ONS进行通信的远程节点信息就保存在OCR中,那么这个信息就保存在OCR的DATABASE.ONS_HOSTS这个键下。
我们可以把这个键导出来:
[[email protected] bin]# ./ocrdump -xml /home/oracle/ons_info.xml -keyname DATABASE.ONS_HOSTS [[email protected] bin]# cat /home/oracle/ons_info.xml <OCRDUMP> <TIMESTAMP>01/28/2015 10:46:35</TIMESTAMP> <COMMAND>./ocrdump.bin -xml /home/oracle/ons_info.xml -keyname DATABASE.ONS_HOSTS </COMMAND> <KEY> <NAME>DATABASE.ONS_HOSTS</NAME> <VALUE_TYPE>UNDEF</VALUE_TYPE> <VALUE><![CDATA[]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac3</NAME> --节点 <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[rac3]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME> --节点对应的端口 <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[6200]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> <KEY> <NAME>DATABASE.ONS_HOSTS.rac4</NAME> --节点 <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[rac4]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME> --端口 <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[6200]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> </KEY> </OCRDUMP>
3.配置ONS
配置ONS时我们可以直接编辑ONS的配置文件来修改配置(useocr=OFF时),如果ONS节点通信的配置信息放在了OCR中(useocr=ON时),可以使用root身份执行racgons命令进行配置。
注意:racgons命令必须用root身份执行,如果使用oracle身份执行这个命令,不会提示任何错误信息,但是也不会更改任何配置。
---添加配置:
[[email protected] bin]# ./racgons add_config rac3:6300 rac4:6300 [[email protected] bin]# ./ocrdump -xml /home/oracle/ons_info2.xml -keyname DATABASE.ONS_HOSTS [[email protected] bin]# cat /home/oracle/ons_info2.xml <OCRDUMP> <TIMESTAMP>01/28/2015 10:56:30</TIMESTAMP> <COMMAND>./ocrdump.bin -xml /home/oracle/ons_info2.xml -keyname DATABASE.ONS_HOSTS </COMMAND> <KEY> <NAME>DATABASE.ONS_HOSTS</NAME> <VALUE_TYPE>UNDEF</VALUE_TYPE> <VALUE><![CDATA[]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac3</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[rac3]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[6200 6300]]></VALUE> --可以看到增加了6300端口 <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> <KEY> <NAME>DATABASE.ONS_HOSTS.rac4</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[rac4]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[6200 6300]]></VALUE> --可以看到增加了6300端口 <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> </KEY> </OCRDUMP>
----删除配置
[[email protected] bin]# ./racgons remove_config rac3:6300 rac4:6300 racgons: Existing key value on rac3 = 6200 6300. racgons: rac3:6300 removed from OCR. racgons: Existing key value on rac4 = 6200 6300. racgons: rac4:6300 removed from OCR. [[email protected] bin]# ./ocrdump -xml /home/oracle/ons_info3.xml -keyname DATABASE.ONS_HOSTS [[email protected] bin]# cat /home/oracle/ons_info3.xml <OCRDUMP> <TIMESTAMP>01/28/2015 11:01:13</TIMESTAMP> <COMMAND>./ocrdump.bin -xml /home/oracle/ons_info3.xml -keyname DATABASE.ONS_HOSTS </COMMAND> <KEY> <NAME>DATABASE.ONS_HOSTS</NAME> <VALUE_TYPE>UNDEF</VALUE_TYPE> <VALUE><![CDATA[]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac3</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[rac3]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[6200 ]]></VALUE> --可以看到6300端口已被删除 <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> <KEY> <NAME>DATABASE.ONS_HOSTS.rac4</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[rac4]]></VALUE> <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> <KEY> <NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME> <VALUE_TYPE>ORATEXT</VALUE_TYPE> <VALUE><![CDATA[6200 ]]></VALUE> --可以看到6300端口已被删除 <USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION> <GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION> <OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION> <USER_NAME>oracle</USER_NAME> <GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> </KEY> </OCRDUMP>
4.onsctl命令
用onsctl命令可以启动、停止、调试ONS,并重新载入配置文件,其命令格式如下:
[[email protected] bin]# ./onsctl -help usage: ./onsctl start|stop|ping|reconfig|debug start - Start opmn only. stop - Stop ons daemon ping - Test to see if ons daemon is running debug - Display debug information for the ons daemon reconfig - Reload the ons configuration help - Print a short syntax description (this). detailed - Print a verbose syntax description.
注意:ONS进程运行,并不一定代表ONS正常工作,需要使用ping命令来确认。
<1>在OS级别查看进程状态
[[email protected] bin]# ps -ef|grep ons |grep -v grep oracle 27813 1 0 10:31 ? 00:00:00 /opt/ora10g/product/10.2.0/crs_1/opmn/bin/ons -d oracle 27814 27813 0 10:31 ? 00:00:00 /opt/ora10g/product/10.2.0/crs_1/opmn/bin/ons -d
从输出信息可见ONS进程正常运行。
<2>确认ONS服务状态
[[email protected] bin]# ./onsctl ping Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 ons is running ...
从输出信息可见ONS进程正常运行。
<3>停止ons服务
[[email protected] bin]# ./onsctl stop onsctl: shutting down ons daemon ... Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 [[email protected] bin]# [[email protected] bin]# ./onsctl ping Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 ons is not running ... ---从这里看确认停止成功
<4>启动ons服务
[[email protected] bin]# ./onsctl start Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 onsctl: ons started --启动成功 [[email protected] bin]# [[email protected] bin]# ./onsctl ping Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 ons is running ... --从这里看确认启动成功
<5>使用debug选项查看详细信息
[[email protected] bin]# ./onsctl debug Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 HTTP/1.1 200 OK Content-Length: 1355 Content-Type: text/html Response: ======== ONS ======== Listeners: NAME BIND ADDRESS PORT FLAGS SOCKET ------- --------------- ----- -------- ------ Local 127.000.000.001 6100 00000142 7 Remote 192.168.001.103 6200 00000101 8 Request No listener Server connections: -----该命令最有意义的是能够显示所有连接。 ID IP PORT FLAGS SENDQ WORKER BUSY SUBS ---------- --------------- ----- -------- ---------- -------- ------ ----- 1 192.168.001.104 6200 00010005 0 1 0 Client connections: ID IP PORT FLAGS SENDQ WORKER BUSY SUBS ---------- --------------- ----- -------- ---------- -------- ------ ----- Pending connections: ID IP PORT FLAGS SENDQ WORKER BUSY SUBS ---------- --------------- ----- -------- ---------- -------- ------ ----- 0 127.000.000.001 6100 00000812 0 1 0 0 127.000.000.001 6100 00000812 0 1 0 0 127.000.000.001 6100 00020812 0 1 0 Worker Ticket: 0/0, Idle: 360 THREAD FLAGS -------- -------- f7f86ba0 00000012 f6dd1ba0 00000012 f63d0ba0 00000012 Resources: Notifications: Received: 0, in Receive Q: 0, Processed: 0, in Process Q: 0 Pools: Message: 24/25 (1), Link: 25/25 (1), Subscription: 0/0 (0)
##===========================================================
延伸:
在对以上ons进行配置测试后,使用crs_stat -t 命令发现集群中一个节点 ons启动不起来
[[email protected] ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....SM1.asm application ONLINE ONLINE rac3 ora....C3.lsnr application ONLINE ONLINE rac3 ora.rac3.gsd application ONLINE ONLINE rac3 ora.rac3.ons application ONLINE OFFLINE ora.rac3.vip application ONLINE ONLINE rac3 ora....SM2.asm application ONLINE ONLINE rac4 ora....C4.lsnr application ONLINE ONLINE rac4 ora.rac4.gsd application ONLINE ONLINE rac4 ora.rac4.ons application ONLINE ONLINE rac4 ora.rac4.vip application ONLINE ONLINE rac4 ora.racdb.db application ONLINE ONLINE rac4 ora....b1.inst application ONLINE ONLINE rac3 ora....b2.inst application ONLINE ONLINE rac4
--查看日志
[[email protected] racg]$ tail -f ora.rac3.ons.log .......................................... RCV: Permission denied Communication error with the OPMN server local port. Check the OPMN log files RCV: Permission denied Communication error with the OPMN server loca 2015-01-28 13:34:25.867: [ RACG][2540408064] [29681][2540408064][ora.rac3.ons]: l port. Check the OPMN log files RCV: Permission denied -----一直提示权限被拒绝 Communication error with the OPMN server local port. Check the OPMN log files Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 o 2015-01-28 13:34:25.867: [ RACG][2540408064] [29681][2540408064][ora.rac3.ons]: nscfg[1] {node = rac4, port = 6200} Adding remote host rac4:6200 onsctl: ons failed to start --导致ons启动失败,但onsctl ping显示ons正在运行 2015-01-28 13:34:26.077: [ RACG][2540408064] [29681][2540408064][ora.rac3.ons]: RCV: Permission denied Communication error with the OPMN server local port. Check the OPMN log files
--但是确认ons服务已启动
[[email protected] bin]# ./onsctl ping Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = rac3, port = 6200} Adding remote host rac3:6200 onscfg[1] {node = rac4, port = 6 2015-01-28 13:34:26.077: [ RACG][2540408064] [29681][2540408064][ora.rac3.ons]: 200} Adding remote host rac4:6200 ons is not running ...
重新./onsctl stop 后 ./onsctl start也可以正常关闭和启动,但日志里看到的都是启动不起来
--单独启动的时候
[[email protected] ~]$ crs_start ora.rac3.ons Attempting to start `ora.rac1.ons` on member `rac3` Start of `ora.rac3.ons` on member `rac3` failed. rac4 : CRS-1019: Resource ora.rac3.ons (application) cannot run on rac4
验证了ons的配置权限也没有发现问题,重启了虚拟机尝试,发现ons在两个节点正常启动,问题解决。
现在怀疑可能是权限问题没有检查到或ons进程僵死,启动新的能够启动,日志里还是报错信息。
(一般情况下,暂时的关闭和启动ons资源对系统影响不是太大,因为该资源主要和load balance 、 failover 有关)
[[email protected] ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....SM1.asm application ONLINE ONLINE rac3 ora....C3.lsnr application ONLINE ONLINE rac3 ora.rac3.gsd application ONLINE ONLINE rac3 ora.rac3.ons application ONLINE ONLINE rac3 ora.rac3.vip application ONLINE ONLINE rac3 ora....SM2.asm application ONLINE ONLINE rac4 ora....C4.lsnr application ONLINE ONLINE rac4 ora.rac4.gsd application ONLINE ONLINE rac4 ora.rac4.ons application ONLINE ONLINE rac4 ora.rac4.vip application ONLINE ONLINE rac4 ora.racdb.db application ONLINE ONLINE rac4 ora....b1.inst application ONLINE ONLINE rac3 ora....b2.inst application ONLINE ONLINE rac4
类似问题itpub上的帖子:http://www.itpub.net/thread-1283253-1-1.html
ps -ef|grep ons
致谢:本文档参考了张晓明<<大话Oracle RAC>>