direct path write temp引起的数据库重启以及ogg进程停止

集中客户系统在2016年11月8日17点业务反馈系统链接不上,是不是有问题。当即查看系统

[[email protected] hwbackup3]$crs_stat -t -v

Name           Type           R/RA   F/FT   Target    State     Host

----------------------------------------------------------------------

ora....D1.inst application    1/5    0/0    ONLINE    ONLINE    siddb01

ora....D2.inst application    0/5    0/0    ONLINE    ONLINE    esbdb01

ora....ID2.srv application    0/0    0/0    ONLINE    ONLINE    esbdb01

ora....Bsrv.cs application    0/0    0/1    ONLINE    ONLINE    esbdb01

ora....ID1.srv application    0/0    0/0    ONLINE    ONLINE    siddb01

ora....Dsrv.cs application    0/0    0/1    ONLINE    ONLINE    siddb01

ora.ESBSID.db  application    0/0    0/1    ONLINE    ONLINE    siddb01

ora....01.lsnr application    0/5    0/0    ONLINE    ONLINE    esbdb01

ora....b01.gsd application    2/5    0/0    ONLINE    OFFLINE

ora....b01.ons application    0/3    0/0    ONLINE    ONLINE    esbdb01

ora....b01.vip application    0/0    0/0    ONLINE    ONLINE    esbdb01

ora....01.lsnr application    0/5    0/0    ONLINE    ONLINE    siddb01

ora....b01.gsd application    5/5    0/0    ONLINE    OFFLINE

ora....b01.ons application    1/3    0/0    ONLINE    ONLINE    siddb01

ora....b01.vip application    0/0    0/0    ONLINE    ONLINE    siddb01

因为监控5分钟自动采集还没来得及嗅探,数据库自启动完成,所以运维人员都不知道数据库发生了什么事情。查看一下CRS日志情况,看看是否有切换动作;

2016-11-02 16:26:21.226: [  CRSAPP][2990760] CheckResource error for ora.siddb01.vip error code = -1

2016-11-08 17:08:15.846: [  CRSEVT][3027957] CAAMonitorHandler :: 0:Could not join /oracle/crs/bin/racgwrap(check)

category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2016-11-08 17:08:25.193: [  CRSEVT][3027957] CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.siddb01.vip! (timeout=60

)

2016-11-08 17:08:25.194: [  CRSAPP][3027957] CheckResource error for ora.siddb01.vip error code = -2

2016-11-08 17:09:53.467: [  CRSEVT][3027958] CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/bin/racgwrap(check)

category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2016-11-08 17:10:07.582: [  CRSAPP][3027958] CheckResource error for ora.siddb01.LISTENER_SIDDB01.lsnr error code = -1

2016-11-08 17:10:14.102: [  CRSEVT][3027962] CAAMonitorHandler :: 0:Could not join /oracle/crs/bin/racgwrap(check)

category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2016-11-08 17:10:19.909: [  CRSAPP][3027962] CheckResource error for ora.ESBSID.db error code = -1

2016-11-08 17:10:39.084: [  CRSRES][3027965] In stateChanged, ora.ESBSID.ESBSID1.inst target is ONLINE

2016-11-08 17:10:41.191: [  CRSRES][3027965] ora.ESBSID.ESBSID1.inst on siddb01 went OFFLINE unexpectedly

2016-11-08 17:10:43.326: [  CRSRES][3027965] StopResource: setting CLI values

2016-11-08 17:10:43.723: [  CRSRES][3027965] Attempting to stop `ora.ESBSID.ESBSID1.inst` on member `siddb01`

2016-11-08 17:11:09.197: [  CRSRES][3027977] In stateChanged, ora.ESBSID.SIDsrv.ESBSID1.srv target is ONLINE

2016-11-08 17:11:09.198: [  CRSRES][3027977] ora.ESBSID.SIDsrv.ESBSID1.srv on siddb01 went OFFLINE unexpectedly

2016-11-08 17:11:09.198: [  CRSRES][3027977] StopResource: setting CLI values

2016-11-08 17:11:09.885: [  CRSRES][3027977] Attempting to stop `ora.ESBSID.SIDsrv.ESBSID1.srv` on member `siddb01`

2016-11-08 17:11:10.543: [  CRSRES][3027977] Stop of `ora.ESBSID.SIDsrv.ESBSID1.srv` on member `siddb01` succeeded.

2016-11-08 17:11:10.614: [  CRSRES][3027977] ora.ESBSID.SIDsrv.ESBSID1.srv RESTART_COUNT=0 RESTART_ATTEMPTS=0

2016-11-08 17:11:10.730: [  CRSRES][3027977] ora.ESBSID.SIDsrv.ESBSID1.srv failed on siddb01 relocating.

2016-11-08 17:11:13.721: [  CRSRES][3027977] Attempting to start `ora.ESBSID.SIDsrv.ESBSID1.srv` on member `esbdb01`

2016-11-08 17:11:26.264: [  CRSAPP][3027988] CheckResource error for ora.ESBSID.SIDsrv.cs error code = 1

2016-11-08 17:11:26.267: [  CRSRES][3027988] In stateChanged, ora.ESBSID.SIDsrv.cs target is ONLINE

2016-11-08 17:11:26.267: [  CRSRES][3027988] ora.ESBSID.SIDsrv.cs on siddb01 went OFFLINE unexpectedly

2016-11-08 17:11:26.268: [  CRSRES][3027988] StopResource: setting CLI values

2016-11-08 17:11:26.300: [  CRSRES][3027988] Attempting to stop `ora.ESBSID.SIDsrv.cs` on member `siddb01`

2016-11-08 17:11:26.564: [  CRSRES][3027988] Stop of `ora.ESBSID.SIDsrv.cs` on member `siddb01` succeeded.

2016-11-08 17:11:26.564: [  CRSRES][3027988] ora.ESBSID.SIDsrv.cs RESTART_COUNT=0 RESTART_ATTEMPTS=0

2016-11-08 17:11:27.639: [  CRSRES][3027988] ora.ESBSID.SIDsrv.cs exceeded it‘s failure threshold.  Stopping it and its dependents!

2016-11-08 17:11:27.656: [  CRSRES][3027988] ora.ESBSID.SIDsrv.cs target set to OFFLINE before stop action

2016-11-08 17:11:27.656: [  CRSRES][3027988] StopResource: setting CLI values

2016-11-08 17:11:27.665: [  CRSRES][3027988] Target set to OFFLINE for `ora.ESBSID.SIDsrv.cs`

2016-11-08 17:11:28.419: [  CRSRES][3027965] Stop of `ora.ESBSID.ESBSID1.inst` on member `siddb01` succeeded.

2016-11-08 17:11:28.420: [  CRSRES][3027965] ora.ESBSID.ESBSID1.inst RESTART_COUNT=1 RESTART_ATTEMPTS=5

2016-11-08 17:11:28.420: [  CRSRES][3027965] ora.ESBSID.ESBSID1.inst Uptime exceeds uptime_threshold, resetting RC

2016-11-08 17:11:28.420: [  CRSRES][3027965] Restarting ora.ESBSID.ESBSID1.inst on siddb01

2016-11-08 17:11:28.426: [  CRSRES][3027965] startRunnable: setting CLI values

2016-11-08 17:11:28.426: [  CRSRES][3027965] Attempting to start `ora.ESBSID.ESBSID1.inst` on member `siddb01`

2016-11-08 17:11:34.766: [  CRSRES][3027977] Start of `ora.ESBSID.SIDsrv.ESBSID1.srv` on member `esbdb01` succeeded.

2016-11-08 17:11:46.009: [  CRSRES][3027999] startRunnable: setting CLI values

2016-11-08 17:12:22.967: [  CRSRES][3027965] Start of `ora.ESBSID.ESBSID1.inst` on member `siddb01` succeeded.

2016-11-08 17:12:22.968: [  CRSRES][3027965] Successfully restarted ora.ESBSID.ESBSID1.inst on siddb01, RESTART_COUNT=1

2016-11-08 17:12:22.997: [  CRSRES][3027965] ora.ESBSID.ESBSID1.inst Updated LAST_RESTART time in ocr

打算从会话连接数入手,找到连接数超高时间点,然后以此来查询当时数据库等待事件,然后找到对应的SQL进行分析;

INSTANCE_NUMBER TO_CHAR(TRUNC(SAMPLE_TIME,‘MI‘),‘YYYY-MM-DDHH24:MI‘)                          COUNT(*)
--------------- --------------------------------------------------------------------------- ----------
              1 2016-11-08 16:55                                                                   270
              1 2016-11-08 16:56                                                                   951
              1 2016-11-08 16:57                                                                  1917
              1 2016-11-08 16:58                                                                  2000
              1 2016-11-08 16:59                                                                  1152
              1 2016-11-08 17:12                                                                   379
              1 2016-11-08 17:13                                                                   560
              1 2016-11-08 17:14                                                                   531
              1 2016-11-08 17:15                                                                   458
              1 2016-11-08 17:16                                                                   125
              1 2016-11-08 17:17                                                                   106

查看一下当时时间点的等待事件

SQL> SELECT event_id, event, COUNT (*) cnt                                               
  2      FROM dba_hist_active_sess_history                                               
  3     WHERE snap_id  between 64368 and 64369 AND wait_class_id = 1740759767            
  4  GROUP BY event_id, event   ORDER BY 3 desc;                                         
                                                                                         
  EVENT_ID EVENT                                                                   CNT   
---------- ---------------------------------------------------------------- ----------   
  38438084 direct path write temp                                                 2867   
3926164927 direct path read                                                       2326   
2652584166 db file sequential read                                                1710   
 861319509 direct path read temp                                                   616   
3056446529 read by other session                                                   604   
 506183215 db file scattered read                                                  572   
 834992820 db file parallel read                                                    29   
 885859547 direct path write                                                        18   
SQL> SELECT sql_id, COUNT (*) cnt                                                        
  2    FROM dba_hist_active_sess_history                                                 
  3   WHERE snap_id  between 64368 and 64369                                             
  4         AND event_id IN(38438084)GROUP BY sql_id                                     
  5  HAVING COUNT (*) > 1                                                                
  6  ORDER BY 2 desc;                                                                    
                                                                                         
SQL_ID               CNT                                                                 
------------- ----------                                                                 
a9k9pzsnh5s9n        575                                                                 
gjv7gx73q07c6        258                                                                 
7zbvg66aw143a        227                                                                 
01spkazwnt63u        206                                                                 
dt4xb2ausdy1z        187                                                                 
768krsak8tg2d        185                                                                 
4mzyj9vbyypc8        148                                                                 
d1j0r7w8hwrp9        138                                                                 
du52p4c2735y2        123                                                                 
1wczkxv3znt0h        107                                                                 
b8sj2zpc6n4kh        103                                                                 
4q08ghwrssm09         99                                                                 
ggurqxjuqpca0         84                                                                 
67ram5177zdgf         75

SQL> SELECT sql_id, COUNT (*) cnt                                                        
  2    FROM dba_hist_active_sess_history                                                 
  3   WHERE snap_id  between 64368 and 64369                                             
  4         AND event_id IN(3926164927)GROUP BY sql_id                                   
  5  HAVING COUNT (*) > 1                                                                
  6  ORDER BY 2 desc;                                                                    
                                                                                         
SQL_ID               CNT                                                                 
------------- ----------                                                                 
b8sj2zpc6n4kh        638                                                                 
768krsak8tg2d        607                                                                 
67ram5177zdgf        315                                                                 
6varm0b41sfxd        164                                                                 
4q08ghwrssm09        109                                                                 
01spkazwnt63u         64                                                                 
g9x112cs2dr4h         64

   1,大量的磁盘排序操作,order by, group by, union, distinct,无法在PGA中完成排序,需要利用temp表空间进行排序,当从临时表空间中读取排序结果时,会产生direct path read.
   2. 大量的Hash Join操作,利用temp表空间保存hash区。
   3. SQL语句的并行处理
   4. 大表的全表扫描

上面列举的SQL全都是通过SQL*PLUS连接进来执行的语句也都是distinct\parallel 20

系统内当时也有很多的direct path read、db file sequential read等待事件;

在排序操作(order by/group by/union/distinct/rollup/合并连接)时,由于PGA中的SORT_AREA_SIZE空间不足,造成需要使用临时表空间来保存中间结果,当从临时表空间读入排序结果时,产生direct path read等待事件。

使用HASH连接的SQL语句,将不适合位于内存中的散列分区刷新到临时表空间中。为了查明匹配SQL谓词的行,临时表空间中的散列分区被读回到内存中(目的是为了查明匹配SQL谓词的行),ORALCE会话在direct path read等待事件上等待。

使用并行扫描的SQL语句也会影响系统范围的direct path read等待事件。在并行执行过程中,direct path read等待事件与从属查询有关,而与父查询无关,运行父查询的会话基本上会在PX Deq:Execute Reply上等待,从属查询会产生direct path read等待事件我们来验证一下这些语句的执行计划;

那么有朋友想问了,ogg重启和他有什么关系?这是我们这个系统专门写的脚本,进行判断,为了就是怕将归档日志给删除,倒是ogg进程异常停止,没想到那段时间的归档日志已经被脚本转移到其他目录之下,这样ogg的抽取进程就报了如下错误;

2016-11-08 19:10:15  ERROR   OGG-00446  Could not find archived log for sequence 144299 thread 1 under default destinations SQL <SELECT  name    FROM v$arch

ived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = ‘YES‘ AND

deleted = ‘NO‘ AND         name not like ‘+%‘         AND standby_dest = ‘NO‘ >, error retrieving redo file name for sequence 144299, archived = 1, use_

alternate = 0Not able to establish initial position for sequence 144299, rba 243031568.

OGG-00446也指明了Could not find archived log for sequence 144299 thread 1,我们将故障点以后的ogg进行移回归档目录操作。

因为脚本是用root跑的,权限也都有所改变。再将权限重新赋给oracle用户和它的属组

[email protected] # cd /oracle/oradata/archlog/sidarchlog/
[email protected] # ls -ltar
total 8177360
drwxr-xr-x   6 oracle   dba         8192 Jan  6  2015 ..
-r--r-----   1 oracle   dba      759247872 Jul  7  2015 2_152214_784815576.dbf
-r--r-----   1 oracle   dba         1024 Jul  7  2015 2_152215_784815576.dbf
-r--r-----   1 oracle   dba         1024 Jul  7  2015 2_152216_784815576.dbf
-r--r-----   1 oracle   dba         1024 Jul  7  2015 2_152217_784815576.dbf
-r--r-----   1 oracle   dba      311396864 Nov  8 17:20 1_144306_784815576.dbf
-r--r-----   1 oracle   dba      716086272 Nov  8 17:34 1_144307_784815576.dbf
-r--r-----   1 oracle   dba      187094016 Nov  8 17:49 1_144308_784815576.dbf
-r--r-----   1 oracle   dba      160567296 Nov  8 18:01 1_144309_784815576.dbf
-r--r-----   1 oracle   dba      283110912 Nov  8 18:21 1_144310_784815576.dbf
-r--r-----   1 oracle   dba      316848640 Nov  8 18:47 1_144311_784815576.dbf
-r--r-----   1 oracle   dba      194941952 Nov  8 19:03 1_144312_784815576.dbf
-r--r-----   1 root     root     279188480 Nov  8 19:31 1_144299_784815576.dbf
-r--r-----   1 root     root     242944512 Nov  8 19:31 1_144300_784815576.dbf
drwxr-xr-x   2 oracle   dba        16384 Nov  8 19:39 .
-rw-r--r--   1 oracle   dba          391 Nov  8 19:40 ggserr.log
-r--r-----   1 root     root     167494144 Nov  8 19:45 1_144301_784815576.dbf
-r--r-----   1 root     root     199065088 Nov  8 19:46 1_144302_784815576.dbf
-r--r-----   1 root     root     271761408 Nov  8 19:47 1_144303_784815576.dbf
-r--r-----   1 root     root     96945664 Nov  8 19:47 1_144304_784815576.dbf
-r--r-----   1 root     root        1024 Nov  8 19:47 1_144305_784815576.dbf
[email protected] # chown oracle:dba 1_144*
[email protected] # ls -ltar
total 8177360
drwxr-xr-x   6 oracle   dba         8192 Jan  6  2015 ..
-r--r-----   1 oracle   dba      759247872 Jul  7  2015 2_152214_784815576.dbf
-r--r-----   1 oracle   dba         1024 Jul  7  2015 2_152215_784815576.dbf
-r--r-----   1 oracle   dba         1024 Jul  7  2015 2_152216_784815576.dbf
-r--r-----   1 oracle   dba         1024 Jul  7  2015 2_152217_784815576.dbf
-r--r-----   1 oracle   dba      311396864 Nov  8 17:20 1_144306_784815576.dbf
-r--r-----   1 oracle   dba      716086272 Nov  8 17:34 1_144307_784815576.dbf
-r--r-----   1 oracle   dba      187094016 Nov  8 17:49 1_144308_784815576.dbf
-r--r-----   1 oracle   dba      160567296 Nov  8 18:01 1_144309_784815576.dbf
-r--r-----   1 oracle   dba      283110912 Nov  8 18:21 1_144310_784815576.dbf
-r--r-----   1 oracle   dba      316848640 Nov  8 18:47 1_144311_784815576.dbf
-r--r-----   1 oracle   dba      194941952 Nov  8 19:03 1_144312_784815576.dbf
-r--r-----   1 oracle   dba      279188480 Nov  8 19:31 1_144299_784815576.dbf
-r--r-----   1 oracle   dba      242944512 Nov  8 19:31 1_144300_784815576.dbf
drwxr-xr-x   2 oracle   dba        16384 Nov  8 19:39 .
-rw-r--r--   1 oracle   dba          391 Nov  8 19:40 ggserr.log
-r--r-----   1 oracle   dba      167494144 Nov  8 19:45 1_144301_784815576.dbf
-r--r-----   1 oracle   dba      199065088 Nov  8 19:46 1_144302_784815576.dbf
-r--r-----   1 oracle   dba      271761408 Nov  8 19:47 1_144303_784815576.dbf
-r--r-----   1 oracle   dba      96945664 Nov  8 19:47 1_144304_784815576.dbf
-r--r-----   1 oracle   dba         1024 Nov  8 19:47 1_144305_784815576.dbf

然后我们在重新启动extract进程

GGSCI (SIDDB01) 17> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
JAGENT      STOPPED                                           
EXTRACT     RUNNING     DP01        00:00:00      00:00:03    
EXTRACT     RUNNING     DP02        01:10:28      00:00:05    
EXTRACT     RUNNING     DP03        00:00:00      00:00:10    
EXTRACT     RUNNING     DP04        01:11:41      00:00:06    
EXTRACT     RUNNING     EXT1        00:00:01      00:00:08    
EXTRACT     RUNNING     EXT2        01:10:37      00:00:07    
EXTRACT     RUNNING     EXT3        00:00:01      00:00:09    
EXTRACT     RUNNING     EXT4        01:11:31      00:00:02

进程启动正常,view report ext4也没有报错。剩下的交给时间慢慢抽取,感谢大家。

时间: 2024-07-30 10:19:42

direct path write temp引起的数据库重启以及ogg进程停止的相关文章

遇到direct path sync等待事件

在使用hvr测试 大表同步的时候,遇到direct path sync等待事件,经过查询官方文档,http://docs.oracle.com/cd/E11882_01/server.112/e40402/waitevents003.htm#REFRN00538 有说明如下: direct path sync During Direct Path write operations the data is asynchronously written to the database files.

Oracle 11g新特性direct path read引发的系统停运故障诊断处理

黎俊杰 | 2016-07-28 14:37 声明:部分表名为了脱敏而用XX代替 1.故障现象 (1)一个业务系统输入用户名与密码后无法进入首页,表现为一直在运行等待,运行缓慢 (2)整个系统无法正常使用,接近停运状态 2.故障解决方法 调整数据库参数alter system setevent='10949 trace name context forever, level 1'来关闭"direct path read"(直接路径读)特性,使SQL语句可以从缓存中查询数据,达到降低I/

深入解析direct path read (转)

文章转自:http://www.itpub.net/thread-1815281-1-1.html 传统读取数据的方式是服务器进程通过读取磁盘,然后把数据加载到共享内存中,这样后面的进程就可以通过共享内存访问这些数据,不用再通过缓慢的磁盘读取来完成.direct path read读取数据块方式,是指服务器进程直接读取数据文件,不经过buffer cache,这种方式读取的数据块会加载到服务器进程的PGA内中当中,不会进入buffer cache中.11G之前的direct path read主

oracle 11G direct path read 很美也很伤人

direct path read 在11g中,全表扫描可能使用direct path read方式,绕过buffer cache,这样的全表扫描就是物理读了. 在10g中,都是通过gc buffer来读的,所以不存在direct path read的问题. direct path read较高的可能原因有: 1. 大量的磁盘排序操作,order by, group by, union, distinct, rollup, 无法在PGA中完成排序,需要利用temp表空间进行排序. 当从临时表空间中

direct path read

在11g中,全表扫描可能使用direct path read方式,绕过buffer cache,这样的全表扫描就是物理读了. 在10g中,都是通过gc buffer来读的,所以不存在direct path read的问题. direct path read较高的可能原因有: 1. 大量的磁盘排序操作,order by, group by, union, distinct, rollup, 无法在PGA中完成排序,需要利用temp表空间进行排序. 当从临时表空间中读取排序结果时,会产生direct

unity, Unhandled Exception: System.UnauthorizedAccessException: Access to the path &quot;xxx\Temp\Assembly-CSharp-Editor.dll.mdb&quot; is denied.

Unhandled Exception: System.UnauthorizedAccessException: Access to the path "xxx\Temp\Assembly-CSharp-Editor.dll.mdb" is denied. 在windows上用unity出现上面错误,解法是关掉unity和monodeveloper,然后以管理员身份重新打开unity.

oracle9i statspack 报告 分析 direct path read 等待事件

DB Name         DB Id    Instance     Inst Num Release     Cluster Host ------------ ----------- ------------ -------- ----------- ------- ------------ LIXORA          1409317108 LIXORA                1 9.2.0.1.0   NO      lixora-DATA Snap Id     Sna

C# 中使用Image.FromFile(string path)后,提示该文件正在被另一进程使用XXX的问题

C# 中使用Image.FromFile(string path)后,提示该文件正在被另一进程使用XXX的问题,是因为对应的文件在一直调用 ,其生成的Image对象被Disponse()前都不会被解除锁定,这就造成了此问题,就是在这个图形被解锁前无法对图像进行操作(比如删除,修改等操作). 此问题可以使用下面三个方法解决问题. 方法1:在要进行文件操作前将Image对象销毁. System.Drawing.Image image = System.Drawing.Image.FromFile(f

【翻译自mos文章】对于oracle 数据库来说,OGG的抽取进程什么时候到database中获取数据?

对于oracle 数据库来说,OGG的抽取进程什么时候到database中获取数据? 参考原文: When GoldenGate Fetches Data From The Database On Extraction For Oracle (Doc ID 1059583.1) 适用于: Oracle GoldenGate - Version 4.0.0 and later Information in this document applies to any platform. 解决方法: 问