昨天正式环境上出现数据库CPU 100%的问题,数据库是128个CPU,128G内存,power系列,非常强劲,十几万的逻辑读只需要1s。出现问题之后,调整两条负载最高的两条SQL,问题解决,但有两个cluster类别的等待事件第 一次见,在metlink中找了一篇文章:
共享:RAC等待事件:gc buffer busy acquire
概述
---------------------
gc buffer busy是RAC数据库中常见的等待事件,11g开始gc buffer busy分为gc buffer busy acquire和gc buffer busy release:
gc buffer busy acquire是当session#1尝试请求访问远程实例(remote instance) buffer,但是在session#1之前已经有相同实例上另外一个session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy acquire。
gc buffer busy release是在session#1之前已经有远程实例的session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy release。
原因/解决方法
---------------------
- 热点块(hot block)
在AWR中Segments by Global Cache Buffer Busy 记录了访问频繁的gc buffer.
解决方法可以根据热点块的类型采取不同的解决方法,比如采取分区表,分区索引,反向index等等。这点与单机数据库中的buffer busy waits类似。
- 低效SQL语句
低效SQL语句会导致不必要的buffer被请求访问,增加了buffer busy的机会。在AWR中可以找到TOP SQL。解决方法可以优化SQL语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。
- 数据交叉访问
RAC数据库,同一数据在不同数据库实例上被请求访问。
如果应用程序可以实现,那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问,避免同一数据被多个实例交叉访问,可以减少buffer的争用,避免gc等待。
- Oracle bug
建议安装Oracle推荐的最新Patch Set和PSU。
Patch set和PSU信息请参考:Oracle Recommended Patches -- Oracle Database (Doc ID 756671.1)
当时数据库的负载,已经有几千个session堵塞了。
Snap Id | Snap Time | Sessions | Cursors/Session | Instances | |
---|---|---|---|---|---|
Begin Snap: | 4531 | 01-6月 -15 09:00:06 | 1137 | 106.2 | 2 |
End Snap: | 4533 | 01-6月 -15 11:00:05 | 2728 | 42.4 | 2 |
Elapsed: | 119.98 (mins) | ||||
DB Time: | 72,409.59 (mins) |
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time | Wait Class |
---|---|---|---|---|---|
gc buffer busy acquire | 1,164,631 | 2324.3K | 1996 | 53.5 | Cluster |
DB CPU | 200.7K | 4.6 | |||
enq: TX - row lock contention | 21,390 | 179K | 8368 | 4.1 | Application |
rdbms ipc reply | 9,488,345 | 162.2K | 17 | 3.7 | Other |
buffer busy waits | 3,899 | 94.5K | 24243 | 2.2 | Concurrency |
gc buffer busy release | 2,288 | 62.3K | 27219 | 1.4 | Cluster |
log file sync | 96,502 | 50.9K | 528 | 1.2 | Commit |
latch: row cache objects | 185,628 | 47.8K | 258 | 1.1 | Concurrency |
gc current request | 20 | 40.3K | 2.0E+06 | .9 | Cluster |
db file sequential read | 4,074,919 | 30.7K | 8 | .7 | User I/O |
SQL ordered by Elapsed Time
- Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
- %Total - Elapsed Time as a percentage of Total DB time
- %CPU - CPU Time as a percentage of Elapsed Time
- %IO - User I/O Time as a percentage of Elapsed Time
- Captured SQL account for 84.5% of Total DB Time (s): 4,344,576
- Captured PL/SQL account for 0.2% of Total DB Time (s): 4,344,576
Elapsed Time (s) | Executions | Elapsed Time per Exec (s) | %Total | %CPU | %IO | SQL Id | SQL Module | SQL Text |
---|---|---|---|---|---|---|---|---|
2,295,852.93 | 1,559 | 1,472.64 | 52.84 | 0.05 | 0.04 | 9baga95gnc3qc | JDBC Thin Client | UPDATE BENCH_DONE T SET T.... |
546,544.31 | 3,694 | 147.95 | 12.58 | 16.40 | 0.01 | 8fcx0qg5xa5wy | JDBC Thin Client | SELECT ID, PLAN_STAT... |
234,360.72 | 1,715 | 136.65 | 5.39 | 0.00 | 0.00 | 8w406h1z76j1d | JDBC Thin Client | UPDATE BENCH_DONE T SET T.... |
216,796.09 | 895 | 242.23 | 4.99 | 16.60 | 0.02 | azj8hg39rjqyq | JDBC Thin Client | SELECT COUNT(1) ALL_COUNT, NVL... |
SQL ordered by Gets
- Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- %Total - Buffer Gets as a percentage of Total Buffer Gets
- %CPU - CPU Time as a percentage of Elapsed Time
- %IO - User I/O Time as a percentage of Elapsed Time
- Total Buffer Gets: 26,785,040,521
- Captured SQL account for 47.2% of Total
Buffer Gets | Executions | Gets per Exec | %Total | Elapsed Time (s) | %CPU | %IO | SQL Id | SQL Module | SQL Text |
---|---|---|---|---|---|---|---|---|---|
7,426,092,420 | 3,694 | 2,010,311.97 | 27.72 | 546,544.31 | 16.4 | 0 | 8fcx0qg5xa5wy | JDBC Thin Client | SELECT ID, PLAN_STAT... |
2,468,792,930 | 895 | 2,758,427.85 | 9.22 | 216,796.09 | 16.6 | 0 | azj8hg39rjqyq | JDBC Thin Client | SELECT COUNT(1) ALL_COUNT, NVL... |
230,823,164 | 3,586 | 64,367.87 | 0.86 | 11,911.51 | 12.8 | .1 | 16mmtrx7rw0fb | JDBC Thin Client | select dutylogvos0_.M_DUTY_REC... |
228,243,875 | 1,559 | 146,404.03 | 0.85 | 2,295,852.93 | .1 | 0 | 9baga95gnc3qc | JDBC Thin Client | UPDATE BENCH_DONE T SET T.... |
总结:本次事件主要是低效SQL引起,逻辑读较高,执行频率又高。9baga95gnc3qc这条SQL在上周的表现是10万逻辑读,本周涨到14万,在此对此数据库打一个基线,一个节点,2小时执行1千次,逻辑读超过10万,则需要优化。