Playing with SLOB and hardware prefetchers ! Are they effective ?

Hardware prefetching can reduce the effective memory latency for data and instruction accesses improving performance (reduces cache-miss exposure) but it can also cause  performance degradation in some cases.  (For more information see here )
My current processor intel skylake i5-6500 support 4 types of h/w prefetchers for prefetching data. There are 2 prefetchers associated with L1-data cache (also known as DCU) and 2 prefetchers associated with L2 cache.This hardware prefetcher can be enable/disabled using Model Specific Register (MSR)
Capture
Let’s test how effective they are using SLOB !

To control hardware prefetching feature we can use likwid tool

 Capture 20

Capture 21

Here is an extract from the slob config used for testing LIO :

Capture du 2017-11-07 15_55_16

And here is the result (There is no apparent benefit from disabling them) :

Capture 30

For an example of workload that will benefit from disabling some hardware prefetcher see this document on Apache Spark

“Disabling next-line L1-D and Adjacent Cache line L2 prefetchers can improve the performance by up-to 14% and 4% respectively.”

 

That’s it 😀

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s