由于dns服务为启动导致的GI集群启动故障

1、物业由于突然断电导致grid集群重新启动后rac数据库无法正常启动,对集群进行检查,结果如下,发现其中有4个数据库状态为instance shutdown。
[[email protected] ~]# su - grid
[[email protected] ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.FLASH.dg
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.GRIDDG.dg
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.LTDG.dg
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.ORADG.dg
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.asm
               ONLINE  ONLINE       node1                    Started            
               ONLINE  ONLINE       node2                    Started            
               ONLINE  ONLINE       node3                    Started            
               ONLINE  ONLINE       node4                    Started            
               ONLINE  ONLINE       node5                    Started            
               ONLINE  ONLINE       node6                    Started            
ora.gsd
               OFFLINE OFFLINE      node1                                       
               OFFLINE OFFLINE      node2                                       
               OFFLINE OFFLINE      node3                                       
               OFFLINE OFFLINE      node4                                       
               OFFLINE OFFLINE      node5                                       
               OFFLINE OFFLINE      node6                                       
ora.net1.network
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.ons
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
ora.registry.acfs
               ONLINE  ONLINE       node1                                       
               ONLINE  ONLINE       node2                                       
               ONLINE  ONLINE       node3                                       
               ONLINE  ONLINE       node4                                       
               ONLINE  ONLINE       node5                                       
               ONLINE  ONLINE       node6                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       node2                                       
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       node3                                       
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       node1                                       
ora.cvu
      1        ONLINE  ONLINE       node3                                       
ora.efmisdb.db
      1        ONLINE  OFFLINE                               Instance Shutdown  
ora.efmisdb.efmissrv.svc
      1        ONLINE  OFFLINE                                                  
ora.faspdb.db
      1        ONLINE  OFFLINE                               Instance Shutdown  
ora.faspdb.faspsvc.svc
      1        ONLINE  OFFLINE                                                  
ora.ltdb.db
      1        ONLINE  OFFLINE                               Instance Shutdown  
      2        ONLINE  OFFLINE                               Instance Shutdown  
ora.node1.vip
      1        ONLINE  ONLINE       node1                                       
ora.node2.vip
      1        ONLINE  ONLINE       node2                                       
ora.node3.vip
      1        ONLINE  ONLINE       node3                                       
ora.node4.vip
      1        ONLINE  ONLINE       node4                                       
ora.node5.vip
      1        ONLINE  ONLINE       node5                                       
ora.node6.vip
      1        ONLINE  ONLINE       node6                                       
ora.oadb.db
      1        ONLINE  ONLINE       node6                    Open               
ora.oc4j
      1        ONLINE  ONLINE       node3                                       
ora.orcl.db
      1        ONLINE  ONLINE       node5                    Open               
ora.scan1.vip
      1        ONLINE  ONLINE       node2                                       
ora.scan2.vip
      1        ONLINE  ONLINE       node3                                       
ora.scan3.vip
      1        ONLINE  ONLINE       node1

2、asmcmd查看asm磁盘是否可以正常访问,确认asm正常:                    
[[email protected] ~]$ asmcmd -p
ASMCMD [+] > ls
FLASH/
GRIDDG/
LTDG/
ORADG/
ASMCMD [+] > ls ltdg
ASMCMD [+] > cd oradg
ASMCMD [+oradg] > ls
EFMISDB/
FASPDB/
LTDB/
OADB/
ORCL/
ASMCMD [+oradg] > cd ltdb
ASMCMD [+oradg/ltdb] > ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileltdb.ora
ASMCMD [+oradg/ltdb] > cd datafile
ASMCMD [+oradg/ltdb/datafile] > ls
BSIP_JLPT.335.923932111
EFMIS.291.920050869
EFMIS.292.920050843
EFMIS.293.920050823
EFMIS.294.920050793
EFMIS.338.926285547
EFMIS.339.926285561
EFMIS_YS.327.922529787
EFMIS_YS.DBF
FASP2.290.920053427
FASP2.325.922112969
FASP2.326.922112977
FASP2.328.923068707
FASP2.329.923571833
FASP2.330.923571847
FASP2.340.926285581
FASP2TEST.334.923932089
SYSAUX.257.919523823
SYSTEM.256.919523823
UNDOTBS1.258.919523825
UNDOTBS2.264.919523987
USERS.259.919523825
ASMCMD [+oradg/ltdb/datafile] > exit

3、登录数据库,尝试启动数据库,此间打开alert日志:
[[email protected] ~]$ export ORACLE_SID=ltdb1
[[email protected] ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Nov 1 12:32:15 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-00119: invalid specification for system parameter REMOTE_LISTENER
ORA-00132: syntax error or unresolved network name ‘lt-cluster:1521‘
SQL> exit

发现数据库启动报错119,查看以下alert日志发现进程kill因为119:

Tue Nov 01 12:32:18 2016
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 32 KB
 
Total Shared Global Region in Large Pages = 0 KB (0%)
 
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
 
RECOMMENDATION:
  Total System Global Area size is 3010 MB. For optimal performance,
  prior to the next instance restart:
  1. Increase the number of unused large pages by
at least 1505 (page size 2048 KB, total size 3010 MB) system wide to
  get 100% of the System Global Area allocated with large pages
  2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 3018 MB to lock
100% System Global Area‘s large pages into physical memory
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 4
Number of processor cores in the system is 4
Number of processor sockets in the system is 2
Private Interface ‘eth1:1‘ configured from GPnP for use as a private interconnect.
  [name=‘eth1:1‘, type=1, ip=169.254.149.234, mac=00-50-56-80-52-8e, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface ‘eth0‘ configured from GPnP for use as a public interface.
  [name=‘eth0‘, type=1, ip=192.168.100.61, mac=00-50-56-80-2b-89, net=192.168.100.0/24, mask=255.255.255.0, use=public/1]
Public Interface ‘eth0:1‘ configured from GPnP for use as a public interface.
  [name=‘eth0:1‘, type=1, ip=192.168.100.70, mac=00-50-56-80-2b-89, net=192.168.100.0/24, mask=255.255.255.0, use=public/1]
Public Interface ‘eth0:6‘ configured from GPnP for use as a public interface.
  [name=‘eth0:6‘, type=1, ip=192.168.100.81, mac=00-50-56-80-2b-89, net=192.168.100.0/24, mask=255.255.255.0, use=public/1]
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
    NUMA status: non-NUMA system
    cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
    Grp 0:
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name:    Linux
Node name:    node1
Release:    2.6.18-308.el5
Version:    #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine:    x86_64
VM name:    VMWare Version: 6
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/initltdb1.ora
System parameters with non-default values:
  processes                = 800
  sessions                 = 1224
  spfile                   = "+ORADG/ltdb/spfileltdb.ora"
  sga_target               = 3008M
  control_files            = "+ORADG/ltdb/controlfile/current.260.919523907"
  control_files            = "+FLASH/ltdb/controlfile/current.256.919523907"
  db_block_size            = 8192
  compatible               = "11.2.0.4.0"
  cluster_database         = TRUE
  db_create_file_dest      = "+ORADG"
  db_recovery_file_dest    = "+FLASH"
  db_recovery_file_dest_size= 4407M
  thread                   = 1
  undo_tablespace          = "UNDOTBS1"
  _partition_large_extents = "FALSE"
  _index_partition_large_extents= "FALSE"
  instance_number          = 1
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=ltdbXDB)"
  remote_listener          = "lt-cluster:1521"
  audit_file_dest          = "/u01/app/oracle/admin/ltdb/adump"
  audit_trail              = "DB"
  db_name                  = "ltdb"
  open_cursors             = 300
  pga_aggregate_target     = 998M
  diagnostic_dest          = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
  169.254.149.234
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Tue Nov 01 12:33:04 2016
USER (ospid: 31080): terminating the instance due to error 119
Instance terminated by USER, pid = 31080

4、目前为止可以看到GI集群启动正常,但是数据库无法打开,报错119,通过查看参数文件配置并未发现异常。通过nslookup检查scan ip解析情况,发现dns解析出现问题,查看dbs服务器,发现dns服务器并未启动。启动dns服务器后重启数据库服务器,发现可以正常打开数据库。

时间: 2024-08-26 22:21:10

由于dns服务为启动导致的GI集群启动故障的相关文章

Oracle 学习之RAC(七) 集群启动解析

集群安装完毕后,重启计算机,集群会跟在系统一起启动.网上很多文章都说是在/etc/inittab中添加一行 h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 但是在我装出来的系统中,inittab中并没有设置任何集群启动的信息.那么集群究竟是怎么启动的呢? 我们查看一下/etc/rc5.d或者/etc/rc3.d这个目录下的脚本. [[email protected] rc5.d]# ll S96ohasd  lrwxrw

sparkR集群启动脚本的封装。

[Author]: kwu sparkR默认是以单机方式运行的.实现sparkR的集群启动,需要指定master地址,占用内存.CPU,及UI端口等,这对分析人员来说是比较麻烦的. 如何实现对其启动的封装,方便分析人员来使用: 创建启动的脚本文件 vi bdcmagicR 输入以下内容 #!/bin/bash function rand(){ min=$1 max=$(($2-$min+1)) num=$(($RANDOM+1000000000)) #增加一个10位的数再求余 echo $(($

Hadoop集群启动之后,datanode节点未正常启动的问题

Hadoop集群启动之后,用JPS命令查看进程发现datanode节点上,只有TaskTracker进程.如下图所示 master的进程: 两个slave的节点进程 发现salve节点上竟然没有datanode进程. 查看了日志,发现有这样一句话: 这句话的意思是:datanode上的data目录权限是765,而期望权限是755,所以使用chmod 755 data命令,将目录权限改为755. 再次重启hadoop集群,发现datanode节点已经正常启动.

Hadoop集群启动、初体验

1. 启动方式要启动Hadoop集群,需要启动HDFS和YARN两个集群.注意:首次启动HDFS时,必须对其进行格式化操作.本质上是一些清理和准备工作,因为此时的HDFS在物理上还是不存在的.hdfs namenode–format或者hadoop namenode –format1.1. 单节点逐个启动在主节点上使用以下命令启动HDFS NameNode:hadoop-daemon.sh start namenode在每个从节点上使用以下命令启动HDFS DataNode:hadoop-dae

k8s集群启动了上万个容器(一个pod里放上百个容器,起百个pod就模拟出上万个容器)服务器超时,无法操作的解决办法

问题说明: 一个POD里放了百个容器,然后让K8S集群部署上百个POD,得到可运行上万个容器的实验目的. 实验环境:3台DELL裸机服务器,16核+64G,硬盘容量忽略吧,上T了,肯定够. 1.一开始运行5000多个容器的时候(也就50个POD),集群部署后,10几分钟就起来了,感觉还不错. 2.增加压力,把50个POD增加到100个POD,感觉也不会很长时间,都等到下班后又过了半个小时,还是没有起来,集群链接缓慢,使用kubect里面的命令,好久都出不来信息,UI界面显示服务器超时. 心想,完

zookeeper源码 — 三、集群启动—leader、follower同步

zookeeper集群启动的时候,首先读取配置,接着开始选举,选举完成以后,每个server根据选举的结果设置自己的角色,角色设置完成后leader需要和所有的follower同步.上面一篇介绍了leader选举过程,这篇接着介绍启动过程中的leader和follower同步过程. 本文结构如下: 同步过程 总结 同步过程 设置server当前状态 server刚启动的时候都处于LOOKING状态,选举完成后根据选举结果和对应配置进入对应的状态,设置状态的方法是: private void se

zookeeper集群启动报错:Cannot open channel to * at election address /ip:3888

zookeeper集群启动报错:Cannot open channel to * at election address /ip:3888 2018年04月06日 20:52:55 中单大魔王 阅读数:729 标签: zookeeper 下面几点需要注意的: 1.确认在每个$zookeeper_home/data/myid中有对应数字 2.是否关闭防火墙:systemctl stop firewalld,systemctl disable firewalld 3.zoo.cfg中的server需

启动和关闭Hadoop集群命令步骤

启动和关闭Hadoop集群命令步骤总结: 1. 在master上启动hadoop-daemon.sh start namenode.2. 在slave上启动hadoop-daemon.sh start datanode.3. 用jps指令观察执行结果.4. 用hdfs dfsadmin -report观察集群配置情况.5. 通过http://npfdev1:50070界面观察集群运行情况.(如果遇到问题 看 https://www.cnblogs.com/zlslch/p/6604189.htm

hadoop集群启动ssh免密登录

1.hadoop对hdfs集群的管理提供两种脚本 hadoop-daemons.sh 本地启动脚本:对集群中的单个节点操作 start-dfs.sh 集群启动脚本:对集群中所有节点统一操作 2.SSH免密登录(防止集群登录超时) ①生成公私钥 1 ssh-keygen -t rsa 在用户目录下有个.ssh文件(隐藏文件),添加authorized_keys文本,将生成的公钥内容重定向(>>)到该文件中. ②直接使用如下命令,可以看到.ssh文件中自动生成了authorized_keys授信文