This is a followup to my previous posts on Deeper look at CPU utilization :
- Deeper look at CPU utilization : The power of PMU events
- Deeper look at CPU utilization : TMAM Example
Following a comment from Kevin Closson here is the hierarchical execution cycles breakdown based on the TMAM method before and after enabling HUGEPAGES when running SLOB for testing Logical I/O.
This will let’s us identify our micro-architectural bottlenecks and correctly characterize the SLOB workloads !I will be using the same system configuration as in my previous posts.
Here is an extract form the SLOB conf :
- WORK_UNIT : 256
- SCALE : 500M
- RUN_TIME : 180
- UPDATE_PCT:0
- SCAN_PCT:0
Without HUGEPAGES :
Hierarchical execution cycles breakdown :
With HUGEPAGES :
Hierarchical execution cycles breakdown :
As in the previous example enabling HUGEPAGE has significantly reduced the pressure on DTLB ! As a result we can see that there is a slight improvement in our CPI and LIO rate.
Workload Characterization :
EXTRAT :
Here is the memory bandwidth consumed in the second case :
That’s it 😀