问题描述:
客户方前台应用某个操作hang住,无报错也未有操作正常结束提示。
查看锁信息如下:
SQL> with lk as (select blocking_instance||‘.‘||blocking_session blocker, inst_id||‘.‘||sid waiter
2 from gv$session
3 where blocking_instance is not null
4 and blocking_session is not null)
5 select lpad(‘ ‘,2*(level-1))||waiter lock_tree from
6 (select * from lk
7 union all
8 select distinct ‘root‘, blocker from lk
9 where blocker not in (select waiter from lk))
10 connect by prior waiter=blocker start with blocker=‘root‘;
LOCK_TREE
-------------------------------------------------------------------------------
1.71
1.1349
2.136
与客户方负责人确认后,Kill持锁session:
SQL> alter system kill session ‘71,36519‘;
alter system kill session ‘71,36519‘
*
ERROR at line 1:
ORA-00031: session marked for kill
SQL> select spid, osuser, s.program from v$session s, v$process p where s.paddr = p.addr and s.sid =71;
SPID OSUSER PROGRAM
------------------------ ------------------------------
16783
[[email protected] ~]$ kill -9 16783
约3秒后再查:
[[email protected] ~]$ kill -9 16783
-bash: kill: (16783) - No such process
SQL> /
LOCK_TREE
-------------------------------------------------------------------------------
1.1349
2.136
SQL> select sid,serial# from v$session where sid=1349;
SID SERIAL#
---------- ----------
1349 3539
Kill该session同样报错ORA-00031: session marked for kill。
SQL> select spid, osuser, s.program from v$session s, v$process p where s.paddr = p.addr and s.sid =1349;
SPID OSUSER PROGRAM
------------------------ ------------------------------
8581
[[email protected] ~]$ kill -9 8581
[[email protected] ~]$ ps -ef |grep 8581
oracle 22749 21897 0 10:34 pts/1 00:00:00 grep 8581
SQL> /
LOCK_TREE
-------------------------------------------------------------------------------
1.1349
2.136
SQL> select spid, osuser, s.program from v$session s, v$process p where s.paddr = p.addr and s.sid =1349;
SPID OSUSER PROGRAM
------------------------ ------------------------------
8581
约5秒后再查:
[[email protected] ~]$ kill -9 8581
-bash: kill: (8581) - No such process
SQL> /
LOCK_TREE
-------------------------------------------------------------------------------
1.1349
2.136
SQL> /
no rows selected.
判断是由于kill session后,后台回滚操作还未结束,所以此sid进程结束较慢。
Kill相关持锁进程后,重新执行页面操作,会产生新的行锁等待。
分析:
期间查到1.71进程(每次行锁的持有者进程)正在执行的sql语句为:
SQL> select sid,sql_text from v$session a,v$sql b where sid in(71) and (b.sql_id=a.sql_id or b.sql_id=a.prev_sql_id);
查询结果为一个insert语句。
与客户确认1.71进程(行锁的持有者进程)执行的sql语句正是客户hang住页面正在执行的sql语句。
select owner,object_type from dba_objects where object_name="TEST_TABLE";结果为同义词。
查询确认TEST_TABLE该对象为Oracle端的同义词,与客户方确认该插入操作由oracle直接写入sqlserver数据库;对象TEST_TABLE在sqlserver端为业务表,oracle通过gateway创建dblink连接sqlserver,从oracle端向sqlserver端推送数据。(本次oracle端出现行锁问题,是由于sqlserver端数据库insert hang,导致oracle端insert无法完成。)
找到客户方该系统sqlserver负责人,尝试在sqlserver本地insert同样hang住,由sqlserver负责人继续排查此问题。
sqlserver端insert问题解决(sqlserver工程师重建了表TEST_TABLE)后,oracle端未再次出现行锁。客户方业务可以正常进行。
至于oracle端产生行锁的原因,由业务开发人员依据业务逻辑进一步分析。
举例:由于insert产生行锁的实验:
SQL> create table t(id int primary key);
Table created.
SQL> insert into t values(1); --插入未提交
1 row created.
session 2:
SQL> insert into t values(1); --此时插入有同样主键记录时SQL产生等待
session 3:
SQL> select sid,type,id1,id2,lmode,request,block from v$lock where sid in (834,854) order by 1,2;
SID TY ID1 ID2 LMODE REQUEST BLOCK
---------- -- -------------- ------- ---------- ----------
834 TM 91874 0 3 0 0
834 TX 262174 192335 6 0 0
834 TX 458776 193901 0 4 0
854 TM 91874 0 3 0 0
854 TX 458776 193901 6 0 1
SQL> select sid,event from v$session_wait where sid in (834,854);
SID EVENT
---------- ----------------------------------------------------------------
834 enq: TX - row lock contention
854 SQL*Net message from client
说明:出现行锁等待时,是可以正常查询表数据的。