Enhancing DBMS_OUTPUT using systemtap

This is a short and quick note to show how we can enhance DBMS_OUTPUT capabilities using a small systemtap script without modifying the source code.Basically it will allow us to display the DBMS_OUTPUT message incrementally (the program don’t need to finish it’s execution) by attaching to an already running session (no need to enable DBMS_OUTPUT). The output can also be easily redirected to a file.

The idea is to try to access the function parameters.This can become complex in case of different arguments types and number but in our case there is only one argument of type varchar2.

Continue reading

Playing with SLOB and hardware prefetchers ! Are they effective ?

Hardware prefetching can reduce the effective memory latency for data and instruction accesses improving performance (reduces cache-miss exposure) but it can also cause  performance degradation in some cases.  (For more information see here )
My current processor intel skylake i5-6500 support 4 types of h/w prefetchers for prefetching data. There are 2 prefetchers associated with L1-data cache (also known as DCU) and 2 prefetchers associated with L2 cache.This hardware prefetcher can be enable/disabled using Model Specific Register (MSR)
Capture
Let’s test how effective they are using SLOB !

Continue reading

Memory bandwidth vs latency response curve

Memory bound applications are sensitive to memory latency and bandwidth that’s why it’s important to measure and monitor them.Even if this two concepts are often described  independently they are inherently interrelated.

According to Bruce Jacob in ” The memory system: you can’t avoid it, you can’t ignore it, you can’t fake it” the bandwidth vs latency response curve for a system has three regions.

  • Constant region: The latency response is fairly constant for the first 40% of the sustained bandwidth.
  • Linear region:  In between 40% to 80% of the sustained bandwidth, the latency response increases almost linearly with the bandwidth demand of the system due to contention overhead by numerous memory requests.
  • Exponential region:  Between 80% to 100% of the sustained bandwidth,  the memory latency is dominated by the contention latency which can be as much as twice the idle latency or more.
  • Maximum sustained bandwidth :  Is 65% to 75% of the theoretical maximum bandwidth.
 Armed with Intel Memory Latency Checker (MLC) let’s check our current system !

Continue reading

Deeper look at CPU utilization : SLOB example

This is a followup to my previous posts on Deeper look at CPU utilization :

Following a comment from Kevin Closson here is the hierarchical execution cycles breakdown based on the TMAM method before and after enabling HUGEPAGES when running SLOB for testing Logical I/O.

This will let’s us identify our micro-architectural bottlenecks and correctly characterize the SLOB workloads ! Continue reading