Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems
Yiwei Li, Boyu Tian, Mingyu Gao
TL;DR
This work tackles metadata storage and lookup bottlenecks in hybrid main memory by introducing Trimma, a hardware-centric approach that combines an indirection-based multi-level remap table (iRT) with an identity-mapping-aware remap cache (iRC). By storing only truly necessary remap entries and repurposing unused metadata space as extra DRAM cache, Trimma significantly reduces metadata overhead while boosting cache hit rates, particularly for high-associativity, fine-grained memory systems. Empirical results show substantial performance gains across HBM3+DDR5 and DDR5+NVM configurations, with average speedups around $1.33\times$–$1.34\times$ and maximum gains up to $1.80\times$, indicating strong scalability for future large-scale hybrid memory architectures. The contributions include a practical hardware-friendly iRT design that scales with fast memory capacity and an iRC design that markedly increases remap-cache coverage, together enabling efficient metadata management without software changes.
Abstract
Hybrid main memory systems combine both performance and capacity advantages from heterogeneous memory technologies. With larger capacities, higher associativities, and finer granularities, hybrid memory systems currently exhibit significant metadata storage and lookup overheads for flexibly remapping data blocks between the two memory tiers. To alleviate the inefficiencies of existing designs, we propose Trimma, the combination of a multi-level metadata structure and an efficient metadata cache design. Trimma uses a multi-level metadata table to only track truly necessary address remap entries. The saved memory space is effectively utilized as extra DRAM cache capacity to improve performance. Trimma also uses separate formats to store the entries with non-identity and identity address mappings. This improves the overall remap cache hit rate, further boosting the performance. Trimma is transparent to software and compatible with various types of hybrid memory systems. When evaluated on a representative hybrid memory system with HBM3 and DDR5, Trimma achieves up to 1.68$\times$ and on average 1.33$\times$ speedup benefits, compared to state-of-the-art hybrid memory designs. These results show that Trimma effectively addresses metadata management overheads, especially for future scalable large-scale hybrid memory architectures.
