Troubleshooting Latch Contention using sytemtap

The purpose of this blog post is to show how we can troubleshoot contention on  a specific latch using a systemtap script. This post is highly inspired by the “latchprof” script developed by Tanel Poder and his systematic approach for latch contention troubleshooting (For more info please check latch-contention-troubleshooting .)

This is what we are going to achieve :

Tested in : oracle

stap -v monitor_latch.stp  “latch_address” “latch#” “refresh_time”


This script show a breakdown of latch holder by pid/session id/sql_hash for “cache buffers chains” latch with address “0x000000009F69FF60”

Continue reading

Dynamic tracing tools : Easier access to session/process address [ksupga_]

When troubleshooting a performance problem or investigating oracle internal using dynamic tracing tools like systemtap,it’s often useful to have the session address at hand. In fact, having the session address we can access many useful information such as : wait_event,p1 and p2  value,sql_id,and many other fields as stored in X$KSUSE (underlying table to V$SESSION). Luca Canali have already done a great work ,he identified that when the function “kskthewt” is called at the end of a wait event the register R13  (tested with Oracle on RHEL6.5 and with Oracle on OEL7 respectively) is pointing to the session addr with some offset and he manged also to determine the offset of the different column of X$KSUSE using X$KQFCO and X$KQFTA as in here.

The question is : Can we determine the session address without probing any function call ?

One way to answer this question is to determine how the value stored in the register R13 was set in the function “kskthewt”. Time to disassemble !

NOTE : This post contain no disassembly code of the oracle executable just the finding !

For basic info on reverse engineering please take look at my previous post.

Continue reading

The missing argument of ksl_get_shared_latch : the power of disassembly in action

In one of my previous post entitled  Latch acquisition/release call-graph : Dynamic tracing tools in action i have assumed that the function “ksl_get_shared_latch” (in version took only 5 arguments :

  • ksl_get_shared_latch(laddr, wait, why, where,mode)

As an exercise to my previous post Reverse engineering : What we need to know as a DBA ? i decided to take a deeper look

Continue reading

Reverse engineering : What we need to know as a DBA ?

“Reverse engineering, also called back engineering, is the processes of extracting knowledge or design information from anything man-made and re-producing it or re-producing anything based on the extracted information” Ref wikipedia

There are many purpose for reverse engineering such discovering software bug,security auditing, removal of copy protection,improving documentation shortcomings,learning purposes,etc. But why is this important to us ?

Continue reading

perf_events : Off/On/Mixed CPU flamegraph extended with oracle wait events

Using  FlameGaphs  for investigating performance problem can be a valuable asset for quick resolution and identification of the root cause. This type of analysis may be needed when the traditional oracle instrumentation are not enough.

This post is based and inspired by the awesome work of Brendan Gregg ,Luca Canali and Frits Hoogland in this area. Please check the references at the end of the post for more info (Worth reading !)

What i will cover here is a tiny script i written for generating 3 types of extended flamegaph using the build in perf tool. I said extend because they actually include the oracle wait events.

  • Off cpu
  • On cpu
  • HOT/COLD flame graph

Continue reading