`
venceinfo
  • 浏览: 36686 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

转载一篇

阅读更多

sure you are all familiar with this situation:

SQL> alter system kill session '152,33';
alter system kill session '152,33'
*
ERROR at line 1:
ORA-00031: session marked for kill

The session trying to issue the kill will hang for 60 seconds and then return this “session marked for kill” message. And the target session does not get killed at all.

So why is that?

The issue is in what this alter system kill command is doing. It’s not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it!

So, intead of ALTER SYSTEM KILL SESSION, the command should look something like ALTER SYSTEM ASK SESSION TO COMMIT SUICIDE.

All the kill session command is doing is ASK the target session to clean up and exit – via setting that bit.

Now, normally the target sessions are nice and check that bit often enough in their code, act on it and die.

But sometimes when the target session happens to be busy looping in some tight loop (due a bug perhaps) or is hung, then it never gets to check that “please die” bit and never exits.

 

This is why DBAs often need to kill the OS process or thread via OS tools to get rid of that session (and its locks, transactions) as when you kill the OS process, PMON will detect it (if not fast enough then it can be woken up via ORADEBUG WAKEUP call few times) and clean up after that session.

So, the “ORA-00031: session marked for kill” message you see after 60 seconds just means that:

1) Your session sets the “please die” bit in target sessions state object

2) Your session waits the target session to die for 60 seconds and times out after it doesn’t happen

3) Your session returns “session marked for kill” error – which means exactly what I wrote in step 1 above (“please die” bit is set)

By the way, if your session waits for the target session to die – what is the wait event you are waiting on then? Every wait, except few bugs, should be instrumented in Oracle.

This is easy to test, run this for example:

SQL> exec dbms_lock.sleep(100)

And immediately after try to kill that sleeping session from your session (in my case my session which issued the alter system kill command was 146). Lets see what it waits for:

SQL> @sw 146

 SID STATE   EVENT                                          SEQ# SEC_IN_WAIT                 P1
------- ------- ---------------------------------------- ---------- ----------- ------------------
 146 WAITING inactive session                                140           3                152

See, the killer waits for inactive session event. The parameter 1 (P1) is 152. What does that mean?

V$EVENT_NAME (or V$SESSION_WAIT itself) gives you the answer. I will run my show event descriptions script to display that events parameter meaning:

SQL> @sed "inactive session"

EVENT# EVENT_NAME                                              PARAMETER1           PARAMETER2
------ ------------------------------------------------------- -------------------- ---------------
296 inactive session                                        session# waited

SQL>

So, the killer waits for inactive session wait for 60 seconds and then times out. If the target session to be killed ever checks that “please die” bit, it will clean up and exit, otherwise it could be stuck forever!

This is the case when you need to log on to the server and kill the target process from OS level. If you can’t log on to target OS for whatever reason (but still have SYSDBA access) then you could try attaching to target process with oradebug and running ORADEBUG EVENT IMMEDIATE CRASH as I’ve explained here.

My preferred approach for killing sessions usually is:

1) of course – verify whether you’re killing the right session first

2) ALTER SYSTEM KILL SESSION

3) If that doesn’t work immediately then check whether the target session has acknowledged the kill and is rolling back its (large) transaction. You can do this by checking V$TRANSACTION.USED_UREC for that session’s transaction (if it has any) and see if its decreasing. If yes, the transaction is rolling back and we just need to wait it to finish. There’s also a bit in V$TRANSACTION which states whether the transaction is rolling back, but I don’t have my notes available right now (I’m on vacation actually :)

4) If there’s no rollback happening and session just seems to be stuck, then its time to kill that session’s process from OS level.

5) If couple of minutes after killing the process from OS level that sessions and its locks & resources are still not released (remember, we have verified that there was no long transaction rollback happening) then I would attach to my own process with oradebug and run “ORADEBUG WAKEUP 2″ couple of times (and checking if the session has gone + waiting few seconds between each invocation). The “2″ here means Oracle PID of PMON process which is usually 2, but you should check it from your V$PROCESS view.

If the session is still not gone, it’s time to open a support request.

Note that I didn’t post all the little details here – for example, before killing a problem process which is stuck it makes sense to gather some diagnostic data first (like run pstack on the problem process few times to get its current stack trace).

分享到:
评论

相关推荐

    如何连接局域网.docx

    如果楼主还不明白,就为您转载一篇文章,我就是根据它做的. 为了体现代表性,这里选择比较主流的网络设备来实现。接入部分使用ADSL,带宽为用户所独享。以共享的方法实现共享上网的准备工作如下: 服务器配置:使用一台...

    HTTP服务器状态码定义.doc

    HTTP服务器状态代码定义 在利用LoadRunner进行性能测试的时候,经常不知道某个HTTP代码的涵义,现在转载一 篇文章,详细介绍了每种类型的HTTP返回码解释,仅供参考。 HTTP服务器状态代码定义(Status Code ...

    Ibatis学习随笔

    转载一篇Ibatis学习随笔。给大家参考

    单片机应用编程技巧100问

    这是我在网上找到的一篇HOLTEK 公司技术人员写的一篇有关单片机的文章既适合初学者又对已经熟练的人员有很好的启示,具体的可以参考他们公司的网站,这里是转载一篇,谨对作者表示敬意!

    回归测试概念和策略

    转载一篇关于回归测试的概述,这篇文章从回归测试的基本概念定义、回归测试的测试策略包  回归测试做为测试中最重要,同时也是最复杂的一种测试类型。一直都是大家关注的焦点。转载一篇关于回归测试的概述,这篇...

    IOS 开发中画扇形图实例详解

    我们知道了这个方法,就可以在自定义UIView的子类的- (void)drawRect:(CGRect)rect里面绘图了,关于drawrect的调用周期,网上也是一找一大堆,等下我会整理一下,转载一篇供你们参考。 废话少说,下面直接开始代码...

    2M接口基本知识点简介

    目前业务设备的接口应用中,百分之九十以上的接口是2M的接口,在实际工作中,发现很多问题都是由于对2M接口不了解,导致问题的复杂化,为此,在这里转载一篇对2M接口的基本知识点进行汇总和归类。

    jquery validate.js表单验证的基本用法入门

    这里转载一篇前辈写的文章,在我自己的理解上修改了一下,仅作记录。 先贴一个国内某大公司的代码: 代码如下: [removed] function lang(key) { mylang = { ‘ls_input_myb’: ‘请输入您的账户’, ‘ls_myb_email’...

    .net,c#,c++,vc.net ,vc 诸多名词的区别与联系

    许多初学者对.net,c#,c++,vc.net ,vc 还不够了解,常常有人问我它们之间有什么区别,那个最好,那个最……等等,现在我转载一篇文章,希望对初学者有所帮助。pdf文件 .net,c#,c++,vc.net ,vc 区别与联系 QQ223857666...

    双系统资料(linux和windows)

    这里拥有你要双系统资料,建议您先安装WINDOWS...转载一篇文章供您参考(因条件有限,不能进行认证,请您自行探索修改LINUX选项的方法): boot.ini 文件是个启动引导程序文件,装多系统或者重装系统的时候会用到它

    javascript定时保存表单数据的代码

    (忘记是不是两家邮箱都有这...,我将在文章最后转载一篇介绍它的文章。 现在,我直接上例子,所谓无代码,无真相嘛: 代码如下: <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.

    博客积分获取(转载)

    2、每发布一篇转载文章:可获得2分; 3、博主的文章每被评论一次:可获得1分; 4、每发表一次评论:可获得1分(自己给自己评论、博主回复评论不获得积分); 5、博文阅读次数每超过100次:可获得1分,阅读...

    RAR旧一篇: WINCE 注册表编辑器

    WINCE打印完美解决方案 以Photosmart 385 照片打印机为例收藏 ... | 旧一篇: WINCE 注册表编辑器 本文来自CSDN博客,转载请标明出处:http://blog.csdn.net/tayosoft/archive/2007/07/12/1686160.aspx

    关于C#优点的一篇文献翻译

    全面,而且能更好的体现了c#的历史。我也是转载的,让大家分享

    转载软件测试试题

    这是我在博客中下载的一篇文档,

    手把手教你在CSND中怎么样去转载别人的文章(2020最新版)

    有时候在CSDN中看到一篇深度好文,想要去转载吸吸欧气,但是复制粘贴又会出现各种各样的格式问题。研究学习了前辈大神的方法后,就在这里写出来分享给大家,也可以加深自己对这种方法的印象和理解。 方法:(我用的...

    Java〖NIO上篇〗看这一篇就够了 缓冲区 通道

    Java NIO一. NIO与IO区别二. 缓冲区2.1 直接缓冲区与非直接缓冲区三. 通道3.1 java针对支持通道的类提供了getChannel()方法FileInputStream/FileOutputStreamRandomAccessFile3.2 在JDK 1.7 中的NIO.2 针对各个通道...

Global site tag (gtag.js) - Google Analytics