There is many reason for unaccounted for time in an Extended SQL trace file one of them is CPU consumption double-counting and this is the subject of this post.For a good case showing when CPU double counting can be significant see Luca Canali Post
So here is an excerpt of what we are going to achieve :
Normal trace file :
New trace file showing cpu consumption inside wait event :
Test database :
Oracle 12.1.0.2.6 with THREADED_EXECUTION set to FALSE
Step 1 :
The first things to do is writing a systemtap script to measure the CPU usage inside a wait event.So here it’s based on the two function “kslwtbctx” and “kslwtectx” marking the begin and end of a wait event.The variable “HZ” represent the interrupt frequency of the system.
trace_oracle_wait_events_cpu.stp function __cputime:long() %{ /* pure */ STAP_RETVALUE = ((current->utime + current->stime)* (1000000 / HZ)); %} global cputime_begin probe process("oracle").function("kslwtbctx") { if (pid() == target()) { cputime_begin = __cputime() } } probe process("oracle").function("kslwtectx") { if (pid() == target()) { cputime_end = __cputime() if(cputime_begin != 0) { cpu_consumed = cputime_end - cputime_begin printf ("c=%d\n",cpu_consumed) } else { printf ("c=0\n"); } } }
Executing the script :
Step 2 :
The second and final thing to do is to combine the two trace files generated by systemtap and SQL trace.
#!/bin/sh trace_filename="$1" stap_filename="$2" i=1 while read -r line do if [[ $line == "WAIT"* ]] then echo "$line" | sed 's/ela=/'`sed -n $i'p' $stap_filename`' ela=/g' i=$((i+1)) else echo "$line" fi done < "$trace_filename"
Executing the script :
We can do this in a more elegant way but this the general idea.We have now a trace file with more precise timing information.
That’s it 😀
Be careful; if the oracle process has multiple threads, or there is more than one oracle process in the system, the scalar shared global cputime_begin could be corrupted in the sense that multiple threads hitting kslwtbctx would each write their own start times into that scalar. Consider making it an array instead by indexing cputime_begin[tid()] or [pid()] or whatever. (The cputime_end / cputime_consumed variables shouldn’t be globals – and you don’t need declarations for local variables.)
Hi Frank,
Thanks for your comment i really appreciate:) In fact like you stated the variable global cputime_begin can be corrupted if using the multithreaded Oracle model by setting the parameter THREADED_EXECUTION to TRUE.In my test case i think it’s safe because i was using THREADED_EXECUTION=false and targeting only one process using the condition “if (pid() == target())” in the script so there is no concurrent use of this variable.Thanks for the clarification about the local variable it was a lapse in concentration i will correct it.