Playing with SLOB and hardware prefetchers ! Are they effective ?

Hardware prefetching can reduce the effective memory latency for data and instruction accesses improving performance (reduces cache-miss exposure) but it can also cause  performance degradation in some cases.  (For more information see here )
My current processor intel skylake i5-6500 support 4 types of h/w prefetchers for prefetching data. There are 2 prefetchers associated with L1-data cache (also known as DCU) and 2 prefetchers associated with L2 cache.This hardware prefetcher can be enable/disabled using Model Specific Register (MSR)
Capture
Let’s test how effective they are using SLOB !

Continue reading

Memory bandwidth vs latency response curve

Memory bound applications are sensitive to memory latency and bandwidth that’s why it’s important to measure and monitor them.Even if this two concepts are often described  independently they are inherently interrelated.

According to Bruce Jacob in ” The memory system: you can’t avoid it, you can’t ignore it, you can’t fake it” the bandwidth vs latency response curve for a system has three regions.

  • Constant region: The latency response is fairly constant for the first 40% of the sustained bandwidth.
  • Linear region:  In between 40% to 80% of the sustained bandwidth, the latency response increases almost linearly with the bandwidth demand of the system due to contention overhead by numerous memory requests.
  • Exponential region:  Between 80% to 100% of the sustained bandwidth,  the memory latency is dominated by the contention latency which can be as much as twice the idle latency or more.
  • Maximum sustained bandwidth :  Is 65% to 75% of the theoretical maximum bandwidth.
 Armed with Intel Memory Latency Checker (MLC) let’s check our current system !

Continue reading

Deeper look at CPU utilization : SLOB example

This is a followup to my previous posts on Deeper look at CPU utilization :

Following a comment from Kevin Closson here is the hierarchical execution cycles breakdown based on the TMAM method before and after enabling HUGEPAGES when running SLOB for testing Logical I/O.

This will let’s us identify our micro-architectural bottlenecks and correctly characterize the SLOB workloads ! Continue reading

Geeky PL/SQL tracer/profiler : Another step

This is my second post under the theme of how to extend our capabilities to trace and profile PL/SQL code.This time motivated by a comment from Luca Canali on my previous post  :

So based on my previous work on geeky PL/SQL tracer let’s see how we can obtain a geeky PL/SQL on-CPU Flame Graph !

Continue reading

Geeky PL/SQL tracer/profiler : First step

This blog post is about how to extend our capabilities to trace and profile PL/SQL code.It’s primarily motivated by few tweets from Franck Pachot and of course because it’s FUN !

Capture 02

Capture 01

So in the first part of this series we are going to answer to this questions : Can we map those underling function to the source PL/SQL object and line number ? Can we obtain a full trace ? Of course yes otherwise there will be no blog post :p

Continue reading