Profiling ORACLE CPU performance using SYSTEMTAP (Queuing time,cr,Top function call,Context switch,etc)

UPDATE 26/05/2015 :  For the new version of StapOra (previously Oracle CPU monitor) including bug fixes and enhancement please click here

In this post i would like to provide one more option to investigate ORACLE CPU performance problem using SYSTEMTAP.

Here is a quick overview of the systemtap script (Oracle CPU monitor version 0.1):

  • Time spent on the run queue
  • IO wait time
  • Top kernel function
  • Top user function
  • Consistent Read by object
  • Consistent Read elapsed time and cpu time
  • Number of context switches

Description of the different part of the tool :

PART 1 :

Capture1

The first part of the script is a high level overview of CPU usage :

  • pid : Pid of the running process
  • run :  cpu time consumed
  • sleep : time spent sleeping (or in iowait)
  • iowait : time spent waiting for I/O (uninterruptible state)
  • queued : time spent on the run queue
  • total : total elaped time
  • user_time : percent of run time in user space mode
  • sys_time : percent of run time in kernel space mode

This part of the script is based on the work done by Jason Baron and Josh Stone https://sourceware.org/systemtap/examples/process/schedtimes.stp

A quick check on this part can for example show if our process is spending a lot of time on the run queue or IO wait.

PART 2 :

Capture2

This part of the script show the top 5 kernel function.So if the big percent of cpu consumption is in kernel mode a quick check in this section can help. The sampling of function call is done with timer.profile probe.

To show the symbol name we most add  “–all-modules” to the systemtap script

PART 3 :

Capture3

This part of the script show the top 5 user function.So if the big percent of cpu consumption is in user space mode a quick check in this section can help.The sampling of function call is done with timer.profile probe.

To show the symbol name for user space function i have hit some problem due to reaching the max size of the table MAX_UNWIND_TABLE_SIZE (16M) in  translate.cxx. So i increased this value and then recompiled systemtap but when rerunning the script my session hanged .

skipping module /app/oracle/12.1.0/dbhome_1/bin/oracle eh_frame table (too big: 25283660 > 16777216)

To workaround this i used “addr2line” to translate the hex addressee in a symbol name.

| awk ‘{if(match($0, “#”)) {cmd=”/usr/bin/addr2line -f -e /app/oracle/product/12.1.0/dbhome_1/bin/oracle ” $2; system(cmd) } else {printf(“%s\n”, $0)}}’

PART 4 :

Capture

This part show the number of context switch and the different cpu in witch the process run.

PART 5 :

Capture60

As you may know Consistent Reads have a serious impact on CPU usage (http://savvinov.com/2015/02/17/logical-io/)

This part help measuring this impact and also determining the performance benefit of reducing the number of LIO or the time spent on a single LIO.(the probe point used is  process(“oracle”).function(“kcbgtcr”))

The development of this section is based on the extensive work done by Alexander Anokhin (https://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/) and also Luca Canalihttp://externaltable.blogspot.com/2014/09/systemtap-into-oracle-for-fun-and-profit.html)

NOTE: Measuring of the elapsed and run time work only on kernel >= 3.10 due to restriction on the use of the “.return” suffix on user space probe point.

Finally running the systemtap script :

The systemtap script take two argument : refresh time in second  and the pid of the process to monitor.

To run the script :

stap -v oracle_cpu_monitor.stp  –all-modules  -x  “spid”  “refresh_time”  | awk ‘{if(match($0, “#”)) {cmd=”/usr/bin/addr2line -f -e /app/oracle/product/12.1.0/dbhome_1/bin/oracle ” $2; system(cmd) } else {printf(“%s\n”, $0)}}’

DOWNLOAD the script : oracle_cpu_monitor.stp

This is the first version of the script “Oracle CPU monitor” if hope i will improve it sooner :D.

One thought on “Profiling ORACLE CPU performance using SYSTEMTAP (Queuing time,cr,Top function call,Context switch,etc)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s