Assessing impact of Major Page Fault on ORACLE database [Systemtap in action]

“A more severe memory latency is a major page fault. These can occur when the system has to synchronize memory buffers with the disk, swap memory pages belonging to other processes, or undertake any other Input/Output activity to free memory. This occurs when the processor references a virtual memory address that has not had a physical page allocated to it. The reference to an empty page causes the processor to execute a fault, and instructs the kernel code to allocate a page and return, all of which increases latency dramatically.” Chapter 2. Memory allocation

As stated in the previous definition a high number of Major Page Fault can cause a serious degradation in server performance due to the added disk latency to the interrupted program execution. This can occur in case of high memory utilization or when the parameter swappiness is set to a high value.

https://en.wikipedia.org/wiki/Swappiness

`vm.swappiness = 0`	The kernel will swap only to avoid an out of memory condition, when free memory will be below `vm.min_free_kbytes` limit. See the “VM Sysctl documentation”.
`vm.swappiness = 1`	Kernel version 3.5 and over, as well as kernel version 2.6.32-303 and over: Minimum amount of swapping without disabling it entirely.
`vm.swappiness = 10`	This value is sometimes recommended to improve performance when sufficient memory exists in a system.
`vm.swappiness = 60`	The default value.
`vm.swappiness = 100`	The kernel will swap aggressively.

So how to assess the impact of a Major Page Fault on an Oracle session ?

Continue reading →

New training available : Troubleshooting Oracle performance using trace data (Event 10046)

January 21, 2016 Mahmoud Hatem Tunning Tunning

I am pleased to announce that classroom (Only in Tunisia) and online training are now available 😀
Information on training schedule and registration details are available on the Training Page.

Combining SQL TRACE & SYSTEMTAP Part 2: No more Unaccounted-for Time due to time spent on CPU run queue

January 19, 2016 Mahmoud Hatem SYSTEMTAP DATABASE, SYSTEMTAP

I my previous post i showed how we can eliminate one of the causes for Unaccounted-for Time,which is CPU double-counting, from SQL trace file using systemtap. But we can do more,The other important causes of missing data in an Extended SQL trace file is “Time Spent Not Executing” (Cary Millsap) which is time spent on CPU run queue.So how to measure it ?

Here is an excerpt of what we are going to achieve :

Old trace file :

New trace file showing cpu consumption inside wait event and time spent on CPU run queue :

Continue reading →

Combining SQL TRACE & SYSTEMTAP Part1: No more CPU double-counting (Unaccounted-for Time)

January 11, 2016January 12, 2016 Mahmoud Hatem SYSTEMTAP SYSTEMTAP, troubleshoting

There is many reason for unaccounted for time in an Extended SQL trace file one of them is CPU consumption double-counting and this is the subject of this post.For a good case showing when CPU double counting can be significant see Luca Canali Post

So here is an excerpt of what we are going to achieve :

Normal trace file :