UPDATE 26/05/2015 : For the new version of StapOra (previously Oracle CPU monitor) including bug fixes and enhancement please click here
In this post i would like to provide one more option to investigate ORACLE CPU performance problem using SYSTEMTAP.
Here is a quick overview of the systemtap script (Oracle CPU monitor version 0.1):
- Time spent on the run queue
- IO wait time
- Top kernel function
- Top user function
- Consistent Read by object
- Consistent Read elapsed time and cpu time
- Number of context switches
Description of the different part of the tool :
PART 1 :
The first part of the script is a high level overview of CPU usage :
- pid : Pid of the running process
- run : cpu time consumed
- sleep : time spent sleeping (or in iowait)
- iowait : time spent waiting for I/O (uninterruptible state)
- queued : time spent on the run queue
- total : total elaped time
- user_time : percent of run time in user space mode
- sys_time : percent of run time in kernel space mode
This part of the script is based on the work done by Jason Baron and Josh Stone https://sourceware.org/systemtap/examples/process/schedtimes.stp
A quick check on this part can for example show if our process is spending a lot of time on the run queue or IO wait.
PART 2 :
This part of the script show the top 5 kernel function.So if the big percent of cpu consumption is in kernel mode a quick check in this section can help. The sampling of function call is done with timer.profile probe.
To show the symbol name we most add “–all-modules” to the systemtap script
PART 3 :
This part of the script show the top 5 user function.So if the big percent of cpu consumption is in user space mode a quick check in this section can help.The sampling of function call is done with timer.profile probe.
To show the symbol name for user space function i have hit some problem due to reaching the max size of the table MAX_UNWIND_TABLE_SIZE (16M) in translate.cxx. So i increased this value and then recompiled systemtap but when rerunning the script my session hanged .
skipping module /app/oracle/12.1.0/dbhome_1/bin/oracle eh_frame table (too big: 25283660 > 16777216)
To workaround this i used “addr2line” to translate the hex addressee in a symbol name.
| awk ‘{if(match($0, “#”)) {cmd=”/usr/bin/addr2line -f -e /app/oracle/product/12.1.0/dbhome_1/bin/oracle ” $2; system(cmd) } else {printf(“%s\n”, $0)}}’
PART 4 :
This part show the number of context switch and the different cpu in witch the process run.
PART 5 :
As you may know Consistent Reads have a serious impact on CPU usage (http://savvinov.com/2015/02/17/logical-io/)
This part help measuring this impact and also determining the performance benefit of reducing the number of LIO or the time spent on a single LIO.(the probe point used is process(“oracle”).function(“kcbgtcr”))
The development of this section is based on the extensive work done by Alexander Anokhin (https://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/) and also Luca Canali (http://externaltable.blogspot.com/2014/09/systemtap-into-oracle-for-fun-and-profit.html)
NOTE: Measuring of the elapsed and run time work only on kernel >= 3.10 due to restriction on the use of the “.return” suffix on user space probe point.
Finally running the systemtap script :
The systemtap script take two argument : refresh time in second and the pid of the process to monitor.
To run the script :
stap -v oracle_cpu_monitor.stp –all-modules -x “spid” “refresh_time” | awk ‘{if(match($0, “#”)) {cmd=”/usr/bin/addr2line -f -e /app/oracle/product/12.1.0/dbhome_1/bin/oracle ” $2; system(cmd) } else {printf(“%s\n”, $0)}}’
DOWNLOAD the script : oracle_cpu_monitor.stp
This is the first version of the script “Oracle CPU monitor” if hope i will improve it sooner :D.
[…] the previous post i have developed a systemtap script to monitor CPU usage (Oracle CPU monitor version 0.1). So here […]