The purpose of this blog post is to show how we can troubleshoot contention on a specific latch using a systemtap script. This post is highly inspired by the “latchprof” script developed by Tanel Poder and his systematic approach for latch contention troubleshooting (For more info please check latch-contention-troubleshooting .)
This is what we are going to achieve :
Tested in : oracle 220.127.116.11/OEL6/UEK4
stap -v monitor_latch.stp “latch_address” “latch#” “refresh_time”
This script show a breakdown of latch holder by pid/session id/sql_hash for “cache buffers chains” latch with address “0x000000009F69FF60”
UPDATE 27/04/2017 : The latch address specified is valid only in the context of the target instance (Same shared memory mapped). I modified the script so that the memory watch point fire only when the address content is modified by a process belonging to the target instance.As we are dealing with virtual address space this memory address can be used in other process for other purpose. Example here a hardware breakpoint set on some address who fired here two times for different program
Part 1 : Monitoring latch acquisition /release
To monitor the latch activity i used a hardware breakpoint that will fire whenever the latch address is modified.The number of hardware breakpoint that we can use is limited as it make use of dedicated registers( usually limited to 4 on x86 for more info ) .So we can not monitor many latch address using hardware breakpoint (I limited my self to one).
But how to know if the latch is acquired or released at every modification ?
Whenever the latch is acquired or released it will modify the first word pointed out by the latch address as stated by Andrey Nikolaev to reflect the PID of the holding process or the number of process holder depending on the latch type/acquisition mode.Also as demonstrated on my previous post the number of gets will be incremented at release time.
Assuming that the latch address is modified only when the latch is acquired or released we can state that if :
- The address is modified by a process X and nb of gets does not change => Latch acquired
- The address is modified by a process X and nb of gets does change=> Latch released
We can access the latch “gets” value at a specific offset from the latch address.This offset has different value for shared and exclusive latch.
Exclusive latch memory layout :
oradebug peek 200222A0 24
[200222A0, 200222B8) = 00000016 00000001 000001D0 00000007
pidˆ gets latch# level#
Shared latch memory layout :
oradebug peek 0x6000AEA8 24
[6000AEA8, 6000AEC0) = 00000002 00000000 00000001 00000007
ˆNproc ˆX flag gets latch#
Reference : Andrey Nikolaev
I used the latch# to determine the offset of the gets ,that’s why it’s passed as a parameter to the script.
NOTE 1: This program will monitor only latches that are acquired in willing to wait mode as the number of gets will not increment in the case of immediate gets mode.
NOTE 2: This program is far from being perfect and is just an example of the usage of hardware breakpoint for monitoring.It had to be enhanced to work perfectly in a multi process environment.
Part 2 : Getting session addr/SID/sql hash
To get the session address i used the technique described on my previous post.So i extracted it from the global symbol “ksupga_” with some offset and then used x$kqfco and x$kqfta to extract the offset of the other fields (SQL_HASH/SID)
DOWNLOAD : monitor_latch.stp
That’s it 😀