log file sync等待超高案例浅析

监控工具DPA发现海外一台Oracle数据库服务器DB Commit Time指标告警,超过红色告警线(40毫秒左右,黄色告警是10毫秒,红色告警线是20毫秒),如下截图所示,生成了对应的时段的AWR报告,发现Top 5 Timed Events里面,log file sync等待事件的平均等待时间为37毫秒,log file parallel write等待事件的平均等待时间为40毫秒

如果对Tanel Poder::Understanding LGWR, Log File Sync Waits and Commit Performance这篇文章所讲述的内容很熟悉的话(经典图如下),那么通过等待事件log file sync与log file parallel write的Avg Wait(ms)指标: 37ms & 40ms,基本上可以判断就是redo log所在的磁盘I/O出现了性能问题

然后在用lfsdiag.sql脚本分析一下详细的统计数据,如下所示:

HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
 

APPROACH: Look at the wait distribution for log file sync waits

by looking at "wait_time_milli". Look at the high wait times then

see if you can correlate those with other related wait events.

 

   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT

---------- ---------------------------------------- --------------- ----------

         1 log file sync                                          1       4363

         1 log file sync                                          2        835

         1 log file sync                                          4       1650

         1 log file sync                                          8       4937

         1 log file sync                                         16     146252

         1 log file sync                                         32     606674

         1 log file sync                                         64     263377

         1 log file sync                                        128     253254

         1 log file sync                                        256          2

         1 log file switch completion                             1        124

         1 log file switch completion                             2          9

         1 log file switch completion                             4         19

         1 log file switch completion                             8         21

         1 log file switch completion                            16         35

         1 log file switch completion                            32         97

         1 log file switch completion                            64        133

         1 log file switch completion                           128        326

         1 log file switch completion                           256       1736

         1 log file switch completion                           512       3042

         1 log file switch completion                          1024       2020

         1 log file parallel write                                1          0

         1 log file parallel write                                2          0

         1 log file parallel write                                4         80

         1 log file parallel write                                8       2142

         1 log file parallel write                               16     170987

         1 log file parallel write                               32     779205

         1 log file parallel write                               64     311463

         1 log file parallel write                              128      79688

         1 log file parallel write                              256      42763

         1 log file parallel write                              512      13052

         1 log file parallel write                             1024      20468

         1 log file parallel write                             2048      14020

         1 log file parallel write                             4096        921

         1 log file parallel write                             8192         96

         1 log file parallel write                            16384         18

         1 log file parallel write                            32768         18

         1 log file parallel write                            65536          8

         1 log file parallel write                           131072          2

         1 LGWR wait for redo copy                                1       8516

         1 LGWR wait for redo copy                                2         20

         1 LGWR wait for redo copy                                4         19

         1 LGWR wait for redo copy                                8         20

         1 LGWR wait for redo copy                               16         11

 

ORDERED BY WAIT_TIME_MILLI

 

   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT

---------- ---------------------------------------- --------------- ----------

         1 log file sync                                          1       4363

         1 log file switch completion                             1        124

         1 log file parallel write                                1          0

         1 LGWR wait for redo copy                                1       8516

         1 log file sync                                          2        835

         1 log file switch completion                             2          9

         1 log file parallel write                                2          0

         1 LGWR wait for redo copy                                2         20

         1 log file sync                                          4       1650

         1 log file switch completion                             4         19

         1 log file parallel write                                4         80

         1 LGWR wait for redo copy                                4         19

         1 log file sync                                          8       4937

         1 log file switch completion                             8         21

         1 log file parallel write                                8       2142

         1 LGWR wait for redo copy                                8         20

         1 log file sync                                         16     146252

         1 log file switch completion                            16         35

         1 log file parallel write                               16     170987

         1 LGWR wait for redo copy                               16         11

         1 log file sync                                         32     606674

         1 log file switch completion                            32         97

         1 log file parallel write                               32     779205

         1 log file sync                                         64     263377

         1 log file switch completion                            64        133

         1 log file parallel write                               64     311463

         1 log file sync                                        128     253254

         1 log file switch completion                           128        326

         1 log file parallel write                              128      79688

         1 log file sync                                        256          2

         1 log file switch completion                           256       1736

         1 log file parallel write                              256      42763

         1 log file switch completion                           512       3042

         1 log file parallel write                              512      13052

         1 log file switch completion                          1024       2020

         1 log file parallel write                             1024      20468

         1 log file parallel write                             2048      14020

         1 log file parallel write                             4096        921

         1 log file parallel write                             8192         96

         1 log file parallel write                            16384         18

         1 log file parallel write                            32768         18

         1 log file parallel write                            65536          8

         1 log file parallel write                           131072          2

 

REDO WRITE STATS

 

"redo write time" in centiseconds (100 per second)

11.1: "redo write broadcast ack time" in centiseconds (100 per second)

11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)

 

VERSION              INST_ID NAME                                                     VALUE        MILLISECONDS

----------------- ---------- ---------------------------------------- --------------------- -------------------

10.2.0.5.0                 1 redo write time                                        9551524        95515240.000

10.2.0.5.0                 1 redo writer latching time                                   51

10.2.0.5.0                 1 redo writes                                            1434931

 

AWR WORST AVG LOG FILE SYNC SNAPS:

上面数据可以看到,log file sync等待事件数量最多的是32ms这个区间的,log file parallel write等待事件发生最多的也是32ms这个区间的,其实这个值已经远远超过7ms,极其不正常。log file parallel write 事件是LGWR进程专属的等待事件,发生在LGWR将log_buffer中的重做日志信息写入联机重做日志文件组的成员文件,LGWR在该事件上等待该写入过程的完成。该事件等待时间过长,说明日志文件所在磁盘缓慢或存在争用。log file sync和log file parallel write是相互关联的。换句话讲,假设log file parallel write的时间非常长,那么必定导致log file sync等待时间拉长。如果log file parallel write 等待非常高,那么可能一般是物理磁盘I/O的问题

另外,我们也检查了一下redo log的切换频率,如下所示,redo log切换的次数并不频繁,生成的归档日志的量也并不大。大部分时候一小时切换零次或一次。

然后我们找了一台机器(上述指标正常的服务器)简单对测试了一下IO的速度,这个方法极其简单,就是看看生成一个大文件需要多长时间,简单测试一下I/O性能(没有考虑cache等,测试采样也不详尽),但是对比数据也基本能验证、反馈磁盘IO存在问题)。

问题服务器:

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 88.271 seconds, 12.2 MB/s

real    1m28.273s

user    0m0.010s

sys     0m0.655s

对比服务器(正常的服务器):

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 2.48344 seconds, 432 MB/s

real    0m2.485s

user    0m0.004s

sys     0m0.386s

如上对比所示,两台服务器生成同样一个大小文件,耗费的时间,I/O性能差别非常大,完全验证了告警的服务器所在的存储I/O存在性能问题,但是公司分工非常明确,DBA也不清楚底层存储出了什么问题,只能将这个问题反馈出来,等待海外负责维护系统和存储的同事的回复。

原文地址:https://www.cnblogs.com/kerrycode/p/11484066.html

时间: 2024-10-28 15:52:25

log file sync等待超高案例浅析的相关文章

log file sync等待超高一例

这是3月份某客户的情况,原因是server硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们能够看到,该系统的load profile信息事实上并不高,每秒才21个transaction.先来看看top5events: 从top 5event,我们能够发现,log file sync的avg wait很之高,高达124ms.大家应该知道,对于绝大多数情况 下,log file sync的平均等待时间是小于5ms的,这个值有点高的离谱. 我们知道,产生log

ORACLE AWR报告之 log file sync等待事件优化的总结【转自ITPUB】

来自白大师(白鳝)对log file sync等待事件优化的总结,供各位puber们学习参考: 一. log file sync平均等待事件时间超过7ms,如果等待时间过长,说明log write每次写入的时间过长,如果能够优化redo日志文件存储,使之存放在更快的磁盘上,就可以减少这个等待事件的单次等待时间.(RAID 5--> RAID 10)   当无法通过优化redo日志的I/O性能来解决问题,或者优化了redo日志的I/O性能后还是无法达到我们的预期,那么该如何处理呢? 二. 有经验的

RAC 性能分析 - 'log file sync' 等待事件

简介 本文主要讨论 RAC 数据库中的'log file sync' 等待事件.RAC 数据库中的'log file sync' 等待事件要比单机数据库中的'log file sync' 等待事件复杂,主要原因是由于RAC 数据库需要将SCN同步到所有实例. 首先,回顾一下单机数据库中的'log file sync' 等待事件,当user session 提交(commit)时,user session会通知LGWR进程将redo buffer中的信息写入到redo log file,当LGWR

Oracle之 等待事件log file sync + log file parallel write (awr优化)

这是3月份某客户的情况,原因是server硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们能够看到,该系统的load profile信息事实上并不高,每秒才21个transaction.先来看看top5events: 从top 5event,我们能够发现,log file sync的avg wait很之高,高达124ms.大家应该知道,对于绝大多数情况 下,log file sync的平均等待时间是小于5ms的,这个值有点高的离谱. 我们知道,产生log

oracle之 等待事件LOG FILE SYNC (awr)优化

log file sycn是ORACLE里最普遍的等待事件之一,一般log file sycn的等待时间都非常短 1-5ms,不会有什么问题,但是一旦出问题,往往都比较难解决.什么时候会产生log file sync等待?常见有以下几种:1)commit操作2)rollback操作3)DDL操作(DDL操作实施前都会首先进行一次commit)4)DDL操作导致的数据字典修改所产生的commit5)某些能递归修改数据字典的操作:比如查询SEQ的next值,可能会导致修改数据字典.一个典型的情况是,

log file sync 事件(转)

log file sync log file sync等待时间发生在redo log从log buffer写入到log file期间. 下面对log file sync做个详细的解释. 何时发生日志写入: 1.commit或者rollback 2.每3秒 3.log buffer 1/3满或者已经有1M的redo数据. 更精确的解释:_LOG_IO_SIZE 大小默认是LOG_BUFFER的1/3,当log buffer中redo数据达到_LOG_IO_SIZE 大小时,发生日志写入. 4.DB

log buffer space等待事件

最近,我们有台服务器在delete操作期间发现一直在等待log buffer space,其他节点就没与这个问题.经查,向重做缓冲区上写入重做记录的进程,为了确保拥有重做缓冲区内必要的空间,需要获得redo allocation锁存器.已获得redo allocation锁存器的状态下,在想要得到重做缓冲区时,若没有适当的剩余空间,则需要等到直到获得空间为止.这时,根据情况等待两种事件.如果当前正在使用的重做日志文件已满,因此无法获得剩余空间,LGWR就会执行日志文件切换,服务器进程则等待log

Oracle db file parallel write 和 log file parallel write 等待事件

一. db file parallel write等待事件 引自如下blog: http://oradbpedia.com/wiki/Wait_Events_-_db_file_parallel_write db文件并行写 db文件并行写等待事件属于Oracle数据库写入程序(DBWR)进程,因为它是将块从SGA写入数据文件的唯一进程.当是写入时,DBWR进程编译一组脏块,将批处理交给操作系统,并等待db文件并行写事件以完成I / O.虽然用户会话从来没有遇到db文件并行写等待事件,但这并不意味

log file switch (checkpoint incomplete)的问题定位

今天測试环境下应用慢.发现数据库出了问题,直接上AWR报告.因为是虚拟机.所以不用贴cpu的个数,能够发现负载高. Snap Id Snap Time Sessions Cursors/Session Begin Snap: 15257 30-Jun-15 09:30:57 558 5.3 End Snap: 15258 30-Jun-15 10:00:27 582 5.7 Elapsed:   29.50 (mins)     DB Time:   717.00 (mins)     查看等待