Assessing impact of Major Page Fault on ORACLE database [Systemtap in action]

“A more severe memory latency is a major page fault. These can occur when the system has to synchronize memory buffers with the disk, swap memory pages belonging to other processes, or undertake any other Input/Output activity to free memory. This occurs when the processor references a virtual memory address that has not had a physical page allocated to it. The reference to an empty page causes the processor to execute a fault, and instructs the kernel code to allocate a page and return, all of which increases latency dramatically.” Chapter 2. Memory allocation

As stated in the previous definition a high number of Major Page Fault can cause a serious degradation in server performance due to the added disk latency to the interrupted program  execution. This can occur in case of high memory utilization or when the parameter swappiness is set to a high value.

https://en.wikipedia.org/wiki/Swappiness

vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit. See the “VM Sysctl documentation”.
vm.swappiness = 1 Kernel version 3.5 and over, as well as kernel version 2.6.32-303 and over: Minimum amount of swapping without disabling it entirely.
vm.swappiness = 10 This value is sometimes recommended to improve performance when sufficient memory exists in a system.
vm.swappiness = 60 The default value.
vm.swappiness = 100 The kernel will swap aggressively.

So how to assess the impact of a Major Page Fault on an Oracle session ?

Continue reading

Combining SQL TRACE & SYSTEMTAP Part 2: No more Unaccounted-for Time due to time spent on CPU run queue

I my previous post i showed how we can eliminate one of the causes for Unaccounted-for Time,which is CPU double-counting, from SQL trace file using systemtap. But we can do more,The other important causes of missing data in an Extended SQL trace file is “Time Spent Not Executing” (Cary Millsap) which is time spent on CPU run queue.So how to measure it ?

Here is an excerpt of what we are going to achieve :

Old trace file :

Capture 12

New trace file  showing cpu consumption inside wait event and time spent on CPU run queue :

Capture 11

Continue reading

Combining SQL TRACE & SYSTEMTAP Part1: No more CPU double-counting (Unaccounted-for Time)

There is many reason for unaccounted for time in an Extended SQL trace file one of them is CPU consumption double-counting and this is the subject of this post.For a good case showing when CPU double counting can be significant see Luca Canali Post

So here is an excerpt of what we are going to achieve :

Normal trace file :
Capture 1

New trace file showing cpu consumption inside wait event :Capture 2
Continue reading