Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes (文档 ID 459694.1)

Applies to:

Oracle Database - Enterprise Edition - Version 10.2.0.2 to 12.1.0.1 [Release 10.2 to 12.1] Linux x86 HP-UX PA-RISC (64-bit) IBM AIX on POWER Systems (64-bit) Oracle Solaris on SPARC (64-bit) HP-UX Itanium Linux x86-64 Oracle Server Enterprise Edition - Version: 10.1 to 11.2

Purpose

Procwatcher is a tool to examine and monitor Oracle database and/or clusterware processes at an interval.   The tool will collect stack traces of these processes using Oracle tools like oradebug short_stack and/or OS debuggers like pstack, gdb, dbx, or ladebug and collect SQL data if specified.

If there are any problems with the prw.sh script or if you you have suggestions, please post a comment on this document with details and e-mail [email protected] with the word "Procwatcher" in the subject line.

Scope

This tool is for Oracle representatives and DBAs looking to troubleshoot a problem further by monitoring processes. This tool can be used in conjunction with other tools or troubleshooting methods depending on the situation.

Details

# This script will find clusterware and/or Oracle Background processes and collect # stack traces for debugging. It will write a file called procname_pid_date_hour.out # for each process. If you are debugging clusterware then run this script as root. # If you are only debugging Oracle background processes then you can run as # root or oracle.

To install the script, simply download it put it in its own directory, unzip it, and give it execute permissions.  Use the following link to download it:

DOWNLOAD PROCWATCHER

Alternatively, you can download Procwatcher and other recommended Support tools from the following article:

RAC and DB Support Tools Bundle Note:1594347.1

Note: If you had a previous version installed, stop it prior to putting the new version in place.

If you are in a clustered environment, you can "deploy" Procwatcher with "prw.sh deploy" to register with the clusterware, propagate to all nodes, and start on all nodes. There is also a deinstall option to deregister from the clusterware and remove the procwatcher directory. In a clustered environment, Procwatcher files will be written to GRID_HOME/log/procwatcher unless the PRWDIR parameter is set.

Requirements

  • Must have /bin and /usr/bin in your $PATH
  • Have your instance_name or db_name set in the oratab and/or set the $ORACLE_HOME env variable.(PRW searches the oratab for the SID it finds and if it can‘t find the SID in the oratab it will default to $ORACLE_HOME). Procwatcher cannot function properly if it cannot find an $ORACLE_HOME to use.
  • Run Procwatcher as the oracle software owner if you are only troubleshooting homes/instances for that user. If you are troubleshooting clusterware processes (EXAMINE_CLUSTER=true or are troubleshooting for multiple oracle users) run as root.
  • If you are monitoring the clusterware you must have the relevant OS debugger installed on your platform; PRW looks for:

Linux - /usr/bin/gdb HP-UX and HP Itanium - /opt/langtools/bin/gdb64 or /usr/ccs/bin/gdb64 Sun - /usr/bin/pstack IBM AIX - /bin/procstack or /bin/dbx HP Tru64 - /bin/ladebug

It will use pstack on any platform where it is available besides Linux (since pstack is a wrapper script for gdb anyway).

Procwatcher Features

  • Procwatcher collects stack traces for all processes defined using either oradebug short_stack or an OS debugger at a predefined interval if contentioin is found.
  • PRW will generate wait chain, session wait, lock, and latch reports if problems are detected (look for pw_* reports in the PRW_DB_subdirectory).
  • PRW will look for wait chains, wait events, lock, and latch contention and also dump stack traces of processes that are either waiting for non-idle wait events or waiting for or holding a lock or latch.
  • PRW will dump wait chain, session wait, lock, latch, current SQL, process memory, and session history information into specific process files (look for prw_* files in the PRW_DB_subdirectory) for any processes or background processes when problems are detected.
  • You can define how aggressive PRW is about getting information by setting parameters like THROTTLE, IDLECPU, and INTERVAL. You can tune these parameters to either get the most information possible or to reduce PRW‘s cpu impact. See below for more information about what each of these parameters does.
  • If CPU usage gets too high on the machine (as defined by IDLECPU), PRW will sleep and wait for CPU utilization to go down.
  • Procwatcher gets stack traces of ALL threads of a process (this is important for clusterware processes).
  • The housekeeper process runs on a 5 minute loop and cleans up files older than the specified number of days (default is 7).
  • If any SQL times out 90 seconds (by default) it will be disabled. At a later time the SQL can be re-tested. If the SQL times out 3 times it will be disabled for the life of Procwatcher. Any GV$ view that times out will automatically revert to the corresponding V$ view. Note that the GV$ view timeout is much lower. The logic is: it‘s not worth using GV$ views if they aren‘t fast...If oradebug shortstack is enabled and it times out or fails, the housekeeper process will re-enable shortstack if the test passes.

Disclaimer, especially if you are monitoring clusterware with EXAMINE_CLUSTER=true (default is false) or if FALL_BACK_TO_OSDEBUGGER=true (default is false): Most OS debuggers will temporarily suspend a process when attaching and dumping a stack trace. Procwatcher minimizes the amount of time that takes as much as possible. Some debuggers can also be CPU intensive. The THROTTLE,; IDLECPU, and INTERVAL parameters (see below) may need to be adjusted to suit your needs depending on how loaded the machine is and how fast it is. Note that some debuggers are faster and can get in and out of a process quicker than others. ; For example, pstack and oradebug short_stack are fast, ladebug is slower.
If you are on HP Itanium or HP-UX: Apply the fix for bug: 10158006 (or bug: 10287978 on 11.2.0.2) before monitoring the database with Procwatcher to fix a known short stack issue on HP.  See Note: 1271173.1 for more information.
If you are on Solaris 10: Apply the fix for Solaris bt 6994922 ( see bug: 15677306 )  before monitoring the database with Procwatcher.

Procwatcher is Ideal for:

  • Session level hangs or severe contention in the database/instance. See Note: 1352623.1
  • Severe performance issues. See Note: 1352623.1
  • Instance evictions and/or DRM timeouts.
  • Clusterware or DB processes stuck or consuming high CPU (must set EXAMINE_CLUSTER=true and run as root for clusterware processes)
  • ORA-4031 and SGA memory management issues. (Set sgamemwatch=diag or sgamemwatch=avoid4031 (not the default). See Note: 1355030.1
  • ORA-4030 and DB process memory issues. (Set USE_SQL=true and process_memory=y).
  • RMAN slowness/contention during a backup. (Set USE_SQL=true and rmanclient=y).

Procwatcher is Not Ideal for...

  • Node evictions/reboots. In order to troubleshoot these you would have to enable Procwatcher for a process(es) that are capable of rebooting the machine. If the OS debugger suspends the processs for too long *that* could cause a reboot of the machine. I would only use Procwatcher for a node eviction/reboot if the problem was reproducing on a test system and I didn‘t care of the node got rebooted. Even in that case the INTERVAL would need to be set low (30) and many options would have to be turned off to get the cycle time low enough (EXAMINE_BG=false, USE_SQL=false, probably removing additional processes from the CLUSTERPROCS list).
  • Non-severe database performance issues. AWR/ADDM/statspack are better options for this...
  • Most installation or upgrade issues. We aren‘t getting data for this unless we are at a stage of the installation/upgrade where key processes are already started.

Procwatcher User Commands

To start Procwatcher:

./prw.sh start

Or if you want to start on all nodes in a clustered environment:

./prw.sh start all

To stop Procwatcher: :

./prw.sh stop

Or if you want to stop on all nodes in a clustered environment:

./prw.sh stop all

To check the status of Procwatcher:

./prw.sh stat

To package up Procwatcher files to upload to support:

./prw.sh pack

All user syntax available:

./prw.sh help
Usage: prw.sh
Verbs are:
deploy - Register Procwatcher in Clusterware and propagate to all nodes start [all] - Start Procwatcher on local node, if ‘all‘ is specified, start on all nodes stop [all] - Stop Procwatcher on local node, if ‘all‘ is specified, stop on all nodes stat - Check the current status of Procwatcher pack - Package up Procwatcher files (on all nodes) to upload to support param - Check current Procwatcher parameters deinstall - Deregister Procwatcher from Clusterware and remove log [number] - See the last [number] lines of the procwatcher log file log [runtime] - See contiuous procwatcher log file info - use Cntrl-C to break help - What you are looking at...

Procwatcher Parameters

######################### CONFIG SETTINGS ############################# # Set EXAMINE_CLUSTER variable if you want to examine clusterware processes (default is false - or set to true): # Note that if this is set to true you must deploy/run procwatcher as root unless using oracle restart EXAMINE_CLUSTER=false
# Set EXAMINE_BG variable if you want to examine all BG processes (default is true - or set to false): EXAMINE_BG=true
# Set permissions on Procwatcher files and directories (default: 777): PRWPERM=777
# Set RETENTION variable to the number of days you want to keep historical procwatcher data (default: 7) RETENTION=7
# Warning e-mails are sent to which e-mail addresses? # "mail" must work on the unix server # Example: [email protected],[email protected] WARNINGEMAIL= ######################## PERFORMANCE SETTINGS ######################### # Set INVERVAL to the number of seconds between runs (default 60): # Probably should not set below 60 if EXAMINE_CLUSTER=true INTERVAL=60
# Set THROTTLE to the max # of stack trace sessions or SQLs to run at once (default 5 - minimum 2): THROTTLE=5
# Set IDLECPU to the percentage of idle cpu remaining before PRW sleeps (default 3 - which means PRW will sleep if the machine is more than 97% busy - check vmstat every 5 seconds) IDLECPU=3
# Set SIDLIST to the list of SIDs you want to examine (default is derived - format example: "RAC1|ASM1|SID3") # If setting for multiple instances for the same DB, specify each SID - example: "ASM1|ASM2|ASM3" # Default: If root is starting prw, get all sids found running at the time prw was started. #          If another user is starting prw, get all sids found running owned by that user. SIDLIST= #######################################################################

Advanced Parameters

# Procwatcher log directory # Default is $GRID_HOME/log/procwatcher if clusterware is running and this is not set # Default is the directory where prw.sh is run if no clusterware and this is not set # Example: PRWDIR=/home/oracle/procwatcher PRWDIR=
# SQL Control # Set USE_SQL variable if you want to use SQL to troubleshoot (default is true - or set to false): USE_SQL=true # Set to ‘y‘ to enable SQL, ‘n‘ to disable sessionwait=y lock=y latchholder=y gesenqueue=y waitchains=y rmanclient=n process_memory=n sqltext=y ash=y
# SGA Memory watch (default: off).  Valid values are: # off = no SGA memory diagnostics # diag = collect SGA memory diagnostics # avoid4031 = collect SGA memory diagnostics and flush the shared pool to avoid ORA-4031 #             if memory fragmentation occurs # Note that setting sgamemwatch to ‘diag‘ or ‘avoid4031‘ will query x$ksmsp # which may increase shared pool latch contention in some environments. # Please keep this in mind and test in a test environment # with load before using this setting in production. sgamemwatch=off
# Levels for debugging before a flush if sgamemwatch=avoid4031 (default: 0 for both) heapdump_level=0 lib_cache_dump_level=0
# Suspect Process Threshold (if # of suspect procs > <value> then collect BG process stacks) # 1 = Get query and stack output if there is at least 1 suspect proc (default) # 0 = Get all diags each cycle suspectprocthreshold=1
# Warning Process Threshold (if # of suspect procs > <value> then issue a WARNING) default=10 warningprocthreshold=10
# Levels for debugging if warningprocthreshold is reached (default: 0 for both) # If using this feature recommended values are (hanganalyze_level=3, systemstate_level=258) # Flood control limits the dumps to a maximum of 3 per hour hanganalyze_level=0 systemstate_level=0
# Cluster Process list for examination (seperated by "|"): # Default: "crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|gipcd.b|mdnsd.b|gpnpd.b|gnsd.bi|diskmon| octssd.b|ons -d|tnslsnr" # - The processes oprocd, cssdagent, and cssdmonitor are intentionally left off the list because of high reboot danger. # - The ocssd.bin process is off the list due to moderate reboot danger.  Only add this if your css misscount is the # - default or higher, your machine is not highly loaded, and you are aware of the tradeoff. CLUSTERPROCS="crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|gipcd.b|mdnsd.b|gpnpd.b| gnsd.bi|diskmon|octssd.b|ons -d|tnslsnr"
# DB Process list for examination (seperated by "|"): # Default: "_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0" # - To examine ALL oracle DB and ASM processes on the machine, set BGPROCS="ora|asm" (not typically recommended) BGPROCS="_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0"
# Set to ‘y‘ to enable gv$views, set to ‘n‘ to disable gv$ views # (makes queries a little faster in RAC but can‘t see other instances in reports) # Default is derived based on if waitchains is used use_gv=
# Set to ‘y‘ to get pmap data for clusterware processes. # Only available on Linux and Solaris use_pmap=n
# DB Versions enabled, set to ‘y‘ or ‘n‘ (this will override the SIDLIST setting) VERSION_10_1=y VERSION_10_2=y VERSION_11_1=y VERSION_11_2=y
# Should we fall back to an OS debugger if oradebug short_stack fails? # OS debuggers are less safe per bug 6859515 so default is false (or set to true) FALL_BACK_TO_OSDEBUGGER=false
# Number of oradebug shortstacks to get on each pass # Will automatically lower if stacks are taking too long STACKCOUNT=3
# Point this to a custom .sql file for Procwatcher to capture every cycle. # Don‘t use big or long running SQL.  The .sql file must be executable. # Only 1 SQL per file. # Example: CUSTOMSQL1=/home/oracle/test.sql CUSTOMSQL1= CUSTOMSQL2= CUSTOMSQL3=

References

NOTE:783456.1 - CRS Diagnostic Data Gathering: A Summary of Common tools and their Usage NOTE:1352623.1 - How To Troubleshoot Database Contention With Procwatcher NOTE:1355030.1 - How To Troubleshoot ORA-4031‘s and Shared Pool Issues With Procwatcher NOTE:1271173.1 - Process Hangs After Issuing Oradebug Short_Stack on HP Platforms
NOTE:1353073.1 - Exadata Diagnostic Collection Guide NOTE:559339.1 - Diagnostic Tools Catalog NOTE:1389167.1 - Get Proactive with Oracle Database NOTE:1428210.1 - Troubleshooting Database Contention With V$Wait_Chains NOTE:396940.1 - Troubleshooting and Diagnosing ORA-4031 Error [Video] NOTE:1477599.1 - Best Practices: Proactive Data Collection for Performance Issues

NOTE:430473.1 - ORA-4031 Common Analysis/Diagnostic Scripts  [Video] NOTE:1096952.1 - Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure NOTE:452358.1 - How to Collect Diagnostics for Database Hanging Issues NOTE:1594347.1 - RAC and DB Support Tools Bundle

Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes (文档 ID 459694.1)

时间: 2024-10-17 13:09:33

Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes (文档 ID 459694.1)的相关文章

Deploying JRE (Native Plug-in) for Windows Clients in Oracle E-Business Suite Release 12 (文档 ID 393931.1)

In This Document Section 1: Overview Section 2: Pre-Upgrade Steps Section 3: Upgrade and Configuration Section 4: Post-installation Steps Section 5: Known Issues Section 6: Appendices This document covers the procedure to upgrade the version of the J

How to change Hostname / IP for a Grid Infrastructure Oracle Restart Standalone Configuration (SIHA) (文档 ID 1552810.1)

Therefore, please perform the next steps after the hostname was updated/changed/modified in the Oracle Restart configuration: 1) Configure the CSS & OHAS services as root user as follows: # <11.2 Grid Infrastructure Oracle Home>/crs/install/root

How to Analyze Problems Related to Internal Errors (ORA-600) and Core Dumps (ORA-7445) using My Oracle Support (文档 ID 260459.1)

Oracle Database - Enterprise Edition - Version 8.1.7.4 and later Information in this document applies to any platform. **Checked for relevance 06-Apr-2010 **Checked for relevance 17-Apr-2013 *** Checked for relevance on 16-Nov-2011 *** Purpose 1.1 Ab

Oracle Multitenant Option - 12c Frequently Asked Questions (文档 ID 1511619.1)译文

适用于: 企业版数据库--版本12.1.0.1(12.1) 本文档中的知识对所有平台均适用. 文档目的 文档描写了插接式数据库的许多方面和用法,以更好的理解该产品,同时,该文档也可做为一个快速参考手册. 问答 12c多租户架构中的CDB/PDB概念知识. 多租户架构中的可插接数据库(PDB)是什么意思? 可插接数据库(PDB)是Oracle数据库12c(12.1)中的新特性.可以在一个数据库内部拥有多个可插接数据库.可插接数据库是完全向后兼容的. 为什么要使用多租户选件? 是为了实现以下数据库整

Oracle、pl/sql安装文档

oracle10g.pl/sql安装文档 Oracle在各种管理系统项目中是不可或缺到,pl/sql也是非常好用的图形化管理工具.初学者(我自己就是证明了)经常在这两个工具上消磨宝贵到时间.趁今天有闲,整理下oracle和plsql到安装方式(后附oracle和plsql到安装包),以备后用及方便他人.个人笔记本是win7 64位,实测win8可用. 下载并解压oracle安装包如下图: 右键单击setup.exe,在菜单中选择兼容性疑难解答,弹出新窗口. 选择尝试建议到设置,在跳转到页面中启动

在Oracle电子商务套件版本12.2中创建自定义应用程序(文档ID 1577707.1)

在本文档中 本笔记介绍了在Oracle电子商务套件版本12.2中创建自定义应用程序所需的基本步骤.如果您要创建新表单,报告等,则需要自定义应用程序.它们允许您将自定义编写的文件与Oracle电子商务套件提供的标准种子功能分离.在向您的环境应用修补程序或执行升级时可以保留自定义设置. 自定义数据和索引表空间默认为APPS_TS_TX_DATA和APPS_TS_TX_IDX. 注意:当没有活动的修补程序周期时,应在运行文件系统上执行本文档中描述的过程. 也可以按照此过程更正先前创建的不使用AD Sp

ORACLE官网JAVA学习文档

Trails Covering the Basics 1 Getting Started 1.1 The Java Technology Phenomenon 1.1.1 About the Java Technology The Java Programming Language ?? Figure 1 an overview of the software development process java文件以.java作为后缀 源文件被javac compiler编译为.class 文件

Oracle 12C R2静默安装文档

禁用防火墙/etc/init.d/iptables stop/etc/init.d/ip6tables stopchkconfig iptables off 禁用SELinuxcat /etc/selinux/config-- 改成SELINUX=disabled# This file controls the state of SELinux on the system.# SELINUX= can take one of these three values:# enforcing - SE

Oracle Recommended Patches -- &quot;Oracle JavaVM Component Database PSU&quot; (OJVM PSU) Patches (文档 ID 1929745.1)

From: https://support.oracle.com What is "Oracle JavaVM Component Database PSU" ? Oracle JavaVM Component Database PSU is released as part of the Critical Patch Update program from October 2014 onwards.It consists of two separate patches: One fo