Memory bandwidth vs latency response curve

Memory bound applications are sensitive to memory latency and bandwidth that’s why it’s important to measure and monitor them.Even if this two concepts are often described  independently they are inherently interrelated.

According to Bruce Jacob in ” The memory system: you can’t avoid it, you can’t ignore it, you can’t fake it” the bandwidth vs latency response curve for a system has three regions.

  • Constant region: The latency response is fairly constant for the first 40% of the sustained bandwidth.
  • Linear region:  In between 40% to 80% of the sustained bandwidth, the latency response increases almost linearly with the bandwidth demand of the system due to contention overhead by numerous memory requests.
  • Exponential region:  Between 80% to 100% of the sustained bandwidth,  the memory latency is dominated by the contention latency which can be as much as twice the idle latency or more.
  • Maximum sustained bandwidth :  Is 65% to 75% of the theoretical maximum bandwidth.
 Armed with Intel Memory Latency Checker (MLC) let’s check our current system !

I’am using the same system configuration as on my previous post Deeper look at CPU utilization : The power of PMU events

TEST env : OEL 7.0 / kernel-3.10 /Intel i5-6500 /2*DDR3-1600 (4GB*2)

Matrix of  idle memory latencies for request originating from each of the sockets and addressed to each of the available sockets : (I have only one socket in my system)

Capture du 2017-11-06 16_32_04

Latencies at different b/w points :

Capture du 2017-11-06 16_32_46

Capture 01

This graph clearly visualize how memory latency is affected by the increase of the memory bandwidth consumption.

This was the result collected for a single socket system. Sadly i don’t have a  multi-socket system with Non-Uniform Memory Access (NUMA)  for testing, so i’am going to use the result obtained by Luca Canali here using MLC on a dual socket system with Intel Xeon CPU E5-2630 v4 (16 DIMMs DDR4 of 32GB each).

Capture 06

Capture 08

Capture 02

It’s clear that if you peak bandwidth utilization goes into the “exponential region” you application performance may degrade significantly.

Note 1 : To understand how latency vs. b/w data is collected please take a look at the readme manual of MLC.

Note 2 : Hardware prefetching are an effective ways to hide access latencies and additional latency over-head. To get accurate measurement, Intel MLC automatically disables these prefetchers while measuring the latencies.

That’s it 😀

Ref:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s