perf_events : Off/On/Mixed CPU flamegraph extended with oracle wait events

Using FlameGaphs for investigating performance problem can be a valuable asset for quick resolution and identification of the root cause. This type of analysis may be needed when the traditional oracle instrumentation are not enough.

This post is based and inspired by the awesome work of Brendan Gregg ,Luca Canali and Frits Hoogland in this area. Please check the references at the end of the post for more info (Worth reading !)

What i will cover here is a tiny script i written for generating 3 types of extended flamegaph using the build in perf tool. I said extend because they actually include the oracle wait events.

Off cpu
On cpu
HOT/COLD flame graph

PART 1 : Probing wait event beginning and ending

Based on the work of Luca Canali Linux Perf Probes for Oracle Tracing i was able to place a probe at different location to capture the beginning and ending of every wait event and the actual wait event number.(function kskthbwt and kskthewt).It was then easy to format the file in a usable way.(wait_begin_timestamp wait_end_timestamp event#)

Note : to transform event# to the actual event name i used a slightly modified version of “eventsname.sql” script developed by Luca Canali

PART 2 : Generating On cpu extended flamegraph

I used perf for profiling stack trace at 999Hz .And then based on the work on part 1 i added the oracle state (wait event or on cpu) before generating the extended on cpu flamegraph.

PART 3 : Generating Off cpu extended flamegraph

Generating extended Off cpu flamegraph is possible using perf tools as explained here .The overhead is minimized as we are matching on a single PID .

PART 4 : Generating HOT/COLD extended flamegraph

In the HOT/COLD extended flamegraph i will merge both Off cpu and On cpu graph to obtain a global view.

Time for demo

I used OEL6 with the debug version of UEK4 (.debug) because older kernel lack the command ‘perf script -F period’ which is needed for generating the off cpu flame graph and also because CONFIG_SCHEDSTATS (insure that all needed tracepoints for off cpu generation are present) is set to yes only in the debug version.

For the database version i hit some problem when trying to put probe point when using an oracle 12.1.0.2.6 version so i switched to and older one in my case 11.2.0.3 but it also work with a 11.2.0.4 version as tested by Luca Canali post.

Installation :

Download flamegraph
Download cpu_profiler.sh and place it in a new directory
Run eventsname.sql and place the output file eventsname.sed in the same directory as cpu_profiler.sh
Update cpu_profiler.sh variable (oracle_path,flamegraph_base)

Test case :

Fetch from a big table created on a tablespace build on a ramfs file system to simulate a very fast storage using a direct path read mode.


mkdir /ramfs
mount -t ramfs -o size=500m ramfs /ramfs
chown oracle:oinstall /ramfs


create tablespace ramfs datafile '/ramfs/data01' size 400M;
alter system set "_serial_direct_read"="ALWAYS";
create table speed_io tablespace ramfs as select a.*  from dba_objects a;

Run multiple test query.

Run the following script to profile the session :

./cpu_profiler.sh PID SAMPLE_TIME_SEC