oracle11g cursor:mutex S导致的load过高的追踪过程详解

--> 问题背景

同事电话我,说oracle数据库很慢,很卡,cpu99%了。

1,查看问题现场,cpu高单核不高,load高

多核,看到每个核只有20%左右使用了1/5,所以数据库应该没事,没有到故障级别,只是load高已经接近90了,E:\u\azure_cloud\oracle\05.png:

2,看io读写比较低,磁盘io不是瓶颈

实时观察,才148,可以看出io性能没有问题,毕竟是ssd磁盘,E:\u\azure_cloud\oracle\04.png:

3,查看锁表情况,比较少

select t2.username,t2.sid,t2.serial#,t2.logon_time fromv$locked_object t1,v$session t2 where t1.session_id=t2.sid order byt2.logon_time;执行没有记录。


SQL> select t2.username,t2.sid,t2.serial#,t2.logon_time from v$locked_object t1,v$session t2 where t1.session_id=t2.sid order by t2.logon_time;

no rows selected

SQL>

原blog地址:http://blog.csdn.net/mchdba/article/details/51299062,谢绝转载。

4,查看session等待,没有等待


SQL>

select /*+ rule */ lpad(‘ ‘, decode(t1.xidusn,0,3,0)) || t1.oracle_username user_name,o.owner,o.object_name,o.object_type,s.sid,s.serial#,p.spid

from v$locked_object t1,dba_objects o, v$session s,v$process p

where t1.object_id=o.object_id

and t1.session_id=o.object_id

and t1.session_id=s.sid

6  and s.paddr=p.addr;

no rows selected

SQL>

5,查看到有很多inactive进程


# 查看inactive进程,kill掉这些进程

SELECT SID, SERIAL#,MODULE, STATUS

FROM V$SESSION S

WHERE S.USERNAME IS NOT NULL

AND UPPER(S.PROGRAM) IN (‘TOAD.EXE‘, ‘W3WP.EXE‘)

AND S.LAST_CALL_ET >= 60*60*2

AND S.STATUS = ‘INACTIVE‘

ORDER BY SID DESC;

# 通过如下sql获取kill命令:

SELECT ‘alter system kill session ‘‘‘||SID||‘,‘|| SERIAL#||‘‘‘;‘    FROM V$SESSION S   WHERE S.USERNAME IS NOT NULL   AND S.LAST_CALL_ET >= 10   AND S.STATUS = ‘INACTIVE‘   ORDER BY S.MACHINE,SID DESC;

Kill掉‘INACTIVE‘ 进程后,db的load还是没有降下来,证明资源没有被释放掉。这么看来,根源在正在运行的sql,系统load 高 io低,说明系统cpu 耗费高,有大量的逻辑运算而且可能会有wait等待。

6,去查看wait_class等待,有大量cursor:mutex S事件


# 查询等待sql语句

SELECT t1.sid,

t1.serial#,

t1.username,

t1.event,

t2.sql_text,

t1.sql_id,

t1.seconds_in_wait,

t1.program,

t1.blocking_session,

-- object_type || ‘: ‘ || object_name object,

t1.wait_class,

t1.state

FROM v$session t1

LEFT OUTER JOIN v$sqlarea t2

on (t1.sql_id = t2.sql_id)

--LEFT OUTER JOIN dba_objects t3

--ON (object_id = row_wait_obj#)

where t1.wait_class <> ‘Idle‘;

7,查看awr报告,也指向了cursor:mutex S

E:\u\azure_cloud\oracle\06.png

E:\u\azure_cloud\oracle\01.png

问题基本可以推测出来是cursor等待造成的

8,验证cursor_sharing参数为EXACT,正常


SQL> show parameter cursor_sharing;

NAME                                        TYPE       VALUE

------------------------------------ ----------- ------------------------------

cursor_sharing                                 string     EXACT

SQL>

9,查看4月28日今天的记录数是35W多,而且这个执行次数还在不停的增加中

select sql_id,count(*)

from dba_hist_active_sess_history

where event like‘cursor: mutex S‘and sample_time>to_date(‘28-04-201600:00:00‘, ‘dd-mm-yyyyhh24:mi:ss‘)

groupby sql_id;


SQL>

select sql_id,count(*)

from dba_hist_active_sess_history

where event like ‘cursor: mutex S‘ and sample_time >to_date(‘28-04-2016 00:00:00‘, ‘dd-mm-yyyy hh24:mi:ss‘)

4   group by sql_id;

SQL_ID              COUNT(*)

------------- ----------

94508

8n23w37kwdn3p            1

08nsmbsvfkzv6           353543

SQL>

10,查看昨天4月27日的记录数,才执行了3W多

select sql_id,count(*)

from dba_hist_active_sess_history

where event like‘cursor: mutex S‘and sample_time<to_date(‘28-04-201600:00:00‘, ‘dd-mm-yyyyhh24:mi:ss‘) and sample_time>to_date(‘27-04-201600:00:00‘, ‘dd-mm-yyyyhh24:mi:ss‘)

groupby sql_id;


SQL> select  snap_id,ELAPSED_TIME_DELTA/EXECUTIONS_DELTA,CPU_TIME_DELTA/EXECUTIONS_DELTA,EXECUTIONS_DELTA from dba_hist_sqlstat where sql_id=‘08nsmbsvfkzv6‘ order by snap_id;

no rows selected

select sql_id,count(*)

from dba_hist_active_sess_history

where event like ‘cursor: mutex S‘ and sample_time <to_date(‘28-04-2016 00:00:00‘, ‘dd-mm-yyyy hh24:mi:ss‘) and sample_time >to_date(‘27-04-2016 00:00:00‘, ‘dd-mm-yyyy hh24:mi:ss‘)

4   group by sql_id;

SQL_ID              COUNT(*)

------------- ----------

11653

08nsmbsvfkzv6            38468

SQL>

而昨天db数据库没有这么慢,这么卡,那么已经基本确定了是由于这今天爆发增长的几十万sql语句引起的cursor异常wait事件,接下来去找到这个具体的sql记录

11,通过sql_id(08nsmbsvfkzv6)查询到具体的sql记录

select sql_text from dba_hist_sqltext where sql_id=‘08nsmbsvfkzv6‘;

Sql太长了,截图截取不过来,贴出来如下:


select * from ( select * from (

(select (case when (instr(‘,‘||wm_concat(type)||‘,‘,‘,develop-plan,‘)>0) then ‘develop-plan‘ else ‘unRead‘ end) type,

max(id) id,name,node_level,project,end,max(created_date) created_date,

center,year_flg,sequence,

min(isRead) isRead,

MyReplace(wm_concat(msgCreateDate),‘,‘) msgCreateDate,wm_concat(group_name),max(planId),0 owe_days,0 owe_money

from (select

‘develop-plan‘ as type,

node.node_id as id,

template_node.node_name as name,

template_node.node_level,

project.project_name as project,

node.schedule_end_date as end,

warning.CREATED_DATE as created_date,

‘‘ as center ,         ‘‘ as year_flg,          ROW_NUMBER() OVER(PARTITION BY node.templet_id, project.project_project_id ORDER BY warning.created_date ASC) as sequence,

(select min(t.is_read) from plan6_message_user_read t where t.node_id=node.node_id and t.need_reader=‘baolm1‘ group by t.node_id) isRead,                 ‘‘ msgCreateDate,info.name group_name,info.biz_cd planId,0 owe_days,0 owe_money           from plan6_warning      warning,

plan6_node         node,

project_distribution         info,project_period pp,

project_project      project,

plan6_templet_node template_node

where warning.node_id = node.node_id

and node.plan_id = info.biz_cd

and info.project_period_id = pp.project_period_id and pp.is_enabled=1 and pp.project_project_id = project.project_project_id

and template_node.node_id = node.templet_id

and info.if_in_plan = 1

and (warning.charger_cd = ‘baolm1‘ OR node.charger_cd=‘baolm1‘ OR node.center_manager_cd=‘baolm1‘ OR project.project_charger_cd=‘baolm1‘ OR project.project_charger_cd2=‘baolm1‘ OR project.project_charger_cd_bis=‘baolm1‘ OR (‘wubc‘=‘baolm1‘ AND template_node.node_level in(1,2) )                 OR ((‘xuhf‘=‘baolm1‘ OR ‘zhanghf‘=‘baolm1‘ ) and template_node.node_level=1)                                  )            and warning.stat = 2

and node.status = 1 and node.is_enabled<>0 and warning.stat = 2

union

select      ‘unRead‘ tyep,node.node_id id,template_node.node_name as name,

template_node.node_level,project.project_name as project,

node.schedule_end_date as end,

to_date(‘‘,‘yyyymmdd hh24:mi:ss‘) as created_date,

‘‘ as center ,‘‘ as year_flg,

ROW_NUMBER() OVER(PARTITION BY node.templet_id,

project.project_project_id ORDER BY r.created_date ASC) as sequence,

r.is_read isRead,to_char(r.created_date,‘yyyymmdd hh24:mi:ss‘) msgCreateDate,info.name group_name,info.biz_cd planId,0 owe_days,0 owe_money

from plan6_message_user_read r,plan6_node node,

project_distribution         info,project_period pp,project_project project,

plan6_templet_node template_node

where r.node_id = node.node_id

and node.plan_id = info.biz_cd

and info.project_period_id = pp.project_period_id and pp.is_enabled=1 and pp.project_project_id = project.project_project_id

and template_node.node_id = node.templet_id

and info.if_in_plan = 1

and node.is_enabled<>0

and r.need_reader=‘baolm1‘

and r.is_read=‘0‘         ) warning

where warning.sequence = 1 group by name,node_level,project,end,center,year_flg,sequence )

union   (select ‘oa_meeting‘ as type,

metting.oa_meeting_id as id,

TO_CHAR (metting.business) as name,

10 as node_level,

‘‘ as project,

metting.target_date as end,

warning.created_date as created_date,

‘‘ as center,       ‘‘ as year_flg,       0 as sequence,‘‘ isRead,‘‘ msgCreateDate,‘‘ group_name,‘‘ planId,0 owe_days,0 owe_money

from oa_meeting metting, plan_warning warning

where metting.oa_meeting_id = warning.target

and metting.HIDDEN_FLG = 0

and metting.status in (1,2, 4)

and metting.target_date is not null    and instr(‘;‘||warning.responsible_person ,‘;baolm1;‘)>0 ) union

(select ‘work-center-plan‘ as type,          cen.plan_work_center_id as id,          cen.content as name,          10 as node_level,          ‘‘ as project,          cen.target_date as end,          warning.created_date as created_date,          ‘‘ center ,                ‘‘ year_flg,                0 as sequence,‘‘ isRead,‘‘ msgCreateDate,‘‘ group_name,‘‘ planId ,0 owe_days,0 owe_money    from plan_work_center cen,plan_warning warning    where cen.plan_work_center_id = warning.target      and cen.status_cd in (1)          and cen.open_type = ‘0‘     and cen.target_date is not null         and instr(‘;‘||cen.principal ,‘;baolm1;‘)>0) union (select ‘project_detail‘ as type,  pd.project_detail_id as id, pd.content as name,12 as node_level, ‘‘ as project, pd.target_date as end,  pd.created_date as created_date,‘‘ as center , ‘‘ as year_flag, 0 as sequence, ‘‘ isRead, ‘‘ msgCreateDate, ‘‘ group_name,‘‘ planId,0 owe_days,0 owe_money from project_detail pd where pd.status in (‘1‘,‘2‘,‘4‘) and pd.project_detail_id in (select pp.project_detail_id from project_principal pp where pp.principal_cd=‘baolm1‘))) warning order by   (case when type=‘work-zc-plan‘ then -1 else 1 end) asc, (case when type=‘unRead‘ then 0 else 1 end) asc,    (case when isRead=0 then 0 else 1 end) asc,       (case when msgCreateDate is not null then 0 else 1 end) asc,  warning.type asc,(case when warning.node_level=0 then 4 else warning.node_level end) asc, trunc(warning.end)-trunc(sysdate) asc,warning.created_date desc,owe_money desc ) where rownum <= :1

By the way,基本可以肯定是这个sql引起的,那么除了这个sql是否还有别的记录呢,这需要现场实时观察。

12,分析当前的时间段记录

最近时间段的mutex记录都是‘08nsmbsvfkzv6‘这条sql引起的:


select sql_id,to_char(sample_time,‘yyyy-mm-dd hh24:mi‘),count(*)

from dba_hist_active_sess_history

where event like‘cursor: mutex S‘

and sample_time > to_date(‘28-04-2016 22:30:00‘, ‘dd-mm-yyyy hh24:mi:ss‘)

AND sample_time < to_date(‘28-04-2016 23:10:00‘, ‘dd-mm-yyyy hh24:mi:ss‘)

groupby sql_id,to_char(sample_time,‘yyyy-mm-dd hh24:mi‘)

orderby sql_id,to_char(sample_time,‘yyyy-mm-dd hh24:mi‘) desc;

 

E:\u\azure_cloud\oracle\07.png

13,分析最近的ashrpt日志报告,最终验证

生成ashrpt_1_0428_2322.html,报告在附件里面,从等待事件可以看到,查看当前占用资源比较多的是cursor:mutex S,E:\u\azure_cloud\oracle\08.png

再从sql分析可以看到,是08nsmbsvfkzv6引起的,如下所示E:\u\azure_cloud\oracle\09.png

14,问题定位以及解决方案

检查‘08nsmbsvfkzv6‘的sql是在哪个业务模块,为啥今天突然暴增?去执行了‘08nsmbsvfkzv6‘的sql语句,里面有个绑定变量,我设置了100,试了下,执行速度很快,里面一条记录也没有,这估计和业务关系比较大吧。

3种解决方案:

(1)找到业务模块,是否可以停止。

(2)优化这条N多行的复杂的sql。

(3)升级oracle版本到11.2.0.4+

时间: 2025-01-01 07:54:11

oracle11g cursor:mutex S导致的load过高的追踪过程详解的相关文章

jQuery Ajax(load,post,get,ajax)用法与详解

今天看到群里面有网友们问到Jquery Ajax的(load,post,get,ajax)之间的区别,现在整理了一篇文章出来,希望可以帮到网友们,首先我们先来看一些简单的方法, 这些方法都是对jQuery.ajax()进行封装以方便我们使用的方法,当然,如果要处理复杂的逻辑,还是需要用到jQuery.ajax()的(这个后面会说到). 1. load( url, [data], [callback] ) :载入远程 HTML 文件代码并插入至 DOM 中. url (String) : 请求的H

Oracle11g 密码延迟认证导致library cache lock的情况分析

在 Oracle 11g 中,为了提升安全性,Oracle 引入了『密码延迟验证』的新特性.这个特性的作用是,如果用户输入了错误的密码尝试登录,那么随着登录错误次数的增加,每次登录前验证的时间也会增加,以此减缓可能对于数据库重复的口令尝试攻击. 但是对于正常的系统,由于口令的更改,可能存在某些被遗漏的客户端,不断重复尝试,从而引起数据库内部长时间的 Library Cache Lock的等待,这种情形非常常见. 如果遇到这一类问题,可以通过Event 28401关闭这个特性,从而消除此类影响,以

[MySQL CPU]线上飙升800%,load达到12的解决过程

接到报警通知,负载过高,达到800%,load也过高,有11了. MySQL版本号为5.6.12-log 1 top 之后,确实是mysqld进程占领了全部资源. 2 查看error日志,无不论什么异常 3 show eninge innodb status\G,没有死锁信息. 4 show full processlist; 没有耗时很大的慢sql再跑.看并发,当前的线程总数量也才30个左右. 5 查看iostat,读写正常. 究竟是什么问题呢?查看slow log,发现例如以下SQL,频繁运

Hession反序列化导致CPU占用飙高

背景 今天发布一个线上服务,暂且称之为O,发布完后,依赖O服务的2个服务C和W大量Time报警,并且这两个服务的CPU占用都飙到了40%左右,平时只有10%的样子. 这时去看O服务的监控,Time并没有升高,QPS反倒降了一半.同时C和W服务器日志中出现了大量的WARNING,信息如下: java.lang.ClassNotFoundException: com.我是不可描述的信息.PropertyAo Dec 02, 2016 6:24:33 PM com.alibaba.com.caucho

oracle 游标变量ref cursor详解

oracle 游标变量ref cursor详解 分类: PL/SQL开发 2013-12-04 15:15 685人阅读 评论(0) 收藏 举报 oracleref cursor 一 介绍      像游标cursor一样,游标变量ref cursor指向指定查询结果集当前行.游标变量显得更加灵活因为其声明并不绑定指定查询. 其主要运用于PLSQL函数或存储过程以及其他编程语言java等程序之间作为参数传递.     不像游标的一点,游标变量没有参数.     游标变量具有以下属性:     (

【SSH三大框架】Hibernate基础第三篇:实体对象的三种状态以及get、load、persist三个方法的详解

一.Hibernate中实体对象分为三种状态:瞬态.持久.脱管 瞬态(transient):这种状态下的实体对象,数据库中没有数据与之对应,超过作用域会被JVM垃圾回收器回收,一般是new出来的并且与Session没有任何关系的对象. 持久(persistent):数据库中有数据与之对应,当前与Session有关联,并且相关联的Session并没有关闭,事务没有提交.PS:持久对象发生改变的时候,在事务提交的时候会影响到数据库中. 脱管(detached):数据库中有数据与之对应,但当前没有Se

在Load average 高的情况下如何鉴别系统瓶颈

在Load average 高的情况下如何鉴别系统瓶颈.是CPU不足,还是io不够快造成? 或是内存不足? 一:查看系统负载vmstat procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 0  0      0 496056

DIV css中cursor属性详解-鼠标移到图片变换鼠标形状 (转)

css中cursor属性详解-鼠标移到图片变换鼠标形状 语法: cursor : auto | all-scroll | col-resize| crosshair | default | hand | move | help | no-drop | not-allowed | pointer | progress | row-resize | text | vertical-text | wait | *-resize | url ( url )  取值: auto  :   默认值.浏览器根

Hibernate系列(三):实体对象的三种状态以及get、load、persist三个方法的详解

一.Hibernate中实体对象分为三种状态:瞬态.持久.脱管 瞬态(transient):这种状态下的实体对象,数据库中没有数据与之对应,超过作用域会被JVM垃圾回收器回收,一般是new出来的并且与Session没有任何关系的对象. 持久(persistent):数据库中有数据与之对应,当前与Session有关联,并且相关联的Session并没有关闭,事务没有提交.PS:持久对象发生改变的时候,在事务提交的时候会影响到数据库中. 脱管(detached):数据库中有数据与之对应,但当前没有Se