Heterogeneous Memory Pool Tuning
Filip Vaverka, Ondrej Vysocky, Lubomir Riha
TL;DR
The work addresses memory bandwidth bottlenecks on modern heterogeneous-memory platforms by introducing a lightweight tool that analyzes and controls allocation-level data placement between DDR and on-package HBM. It combines allocation instrumentation with performance counters to build a planning model and demonstrates, on benchmarks like NAS NP and k-Wave, that substantial speedups are achievable when a meaningful fraction of data resides in HBM. Key findings show that, for several benchmarks, near-peak performance is attainable with roughly 60–75% of data in HBM, with 25–30% remaining in DDR, highlighting practical data-placement strategies. The approach provides a practical path for developers and tuning tools to optimize data layout for heterogeneous memory, improving efficiency on high-bandwidth platforms.
Abstract
We present a lightweight tool for the analysis and tuning of application data placement in systems with heterogeneous memory pools. The tool allows non-intrusively identifying, analyzing, and controlling the placement of individual allocations of the application. We use the tool to analyze a set of benchmarks running on the Intel Sapphire Rapids platform with both HBM and DDR memory. The paper also contains an analysis of the performance of both memory subsystems in terms of read/write bandwidth and latency. The key part of the analysis is to focus on performance if both subsystems are used together. We show that only about 60% to 75% of the data must be placed in HBM memory to achieve 90% of the potential performance of the platform on those benchmarks.
