Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

Yiwei Li; Boyu Tian; Mingyu Gao

Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

Yiwei Li, Boyu Tian, Mingyu Gao

TL;DR

This work tackles metadata storage and lookup bottlenecks in hybrid main memory by introducing Trimma, a hardware-centric approach that combines an indirection-based multi-level remap table (iRT) with an identity-mapping-aware remap cache (iRC). By storing only truly necessary remap entries and repurposing unused metadata space as extra DRAM cache, Trimma significantly reduces metadata overhead while boosting cache hit rates, particularly for high-associativity, fine-grained memory systems. Empirical results show substantial performance gains across HBM3+DDR5 and DDR5+NVM configurations, with average speedups around $1.33\times$–$1.34\times$ and maximum gains up to $1.80\times$, indicating strong scalability for future large-scale hybrid memory architectures. The contributions include a practical hardware-friendly iRT design that scales with fast memory capacity and an iRC design that markedly increases remap-cache coverage, together enabling efficient metadata management without software changes.

Abstract

Hybrid main memory systems combine both performance and capacity advantages from heterogeneous memory technologies. With larger capacities, higher associativities, and finer granularities, hybrid memory systems currently exhibit significant metadata storage and lookup overheads for flexibly remapping data blocks between the two memory tiers. To alleviate the inefficiencies of existing designs, we propose Trimma, the combination of a multi-level metadata structure and an efficient metadata cache design. Trimma uses a multi-level metadata table to only track truly necessary address remap entries. The saved memory space is effectively utilized as extra DRAM cache capacity to improve performance. Trimma also uses separate formats to store the entries with non-identity and identity address mappings. This improves the overall remap cache hit rate, further boosting the performance. Trimma is transparent to software and compatible with various types of hybrid memory systems. When evaluated on a representative hybrid memory system with HBM3 and DDR5, Trimma achieves up to 1.68$\times$ and on average 1.33$\times$ speedup benefits, compared to state-of-the-art hybrid memory designs. These results show that Trimma effectively addresses metadata management overheads, especially for future scalable large-scale hybrid memory architectures.

Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

TL;DR

–

and maximum gains up to

, indicating strong scalability for future large-scale hybrid memory architectures. The contributions include a practical hardware-friendly iRT design that scales with fast memory capacity and an iRC design that markedly increases remap-cache coverage, together enabling efficient metadata management without software changes.

Abstract

and on average 1.33

speedup benefits, compared to state-of-the-art hybrid memory designs. These results show that Trimma effectively addresses metadata management overheads, especially for future scalable large-scale hybrid memory architectures.

Paper Structure (17 sections, 13 figures, 1 table)

This paper contains 17 sections, 13 figures, 1 table.

Introduction
Background and Motivations
Trends of Hybrid Memory Systems
Challenges of Metadata
Design
Design Overview
Indirection-Based Remap Table
Using Saved Spaces for Caching
Identity-Mapping-Aware Remap Cache
Discussion
Experimental Setup
Evaluation
Overall Performance Comparison
Effectiveness Analysis of iRC and iRT
Sensitivity Studies
...and 2 more sections

Figures (13)

Figure 1: Performance comparison among various metadata management schemes for the PageRank workload. Simulation details are in \ref{['sec:methodology']}. "Ideal" represents the theoretical scenario without metadata storage and lookup overheads. Normalized to the ideal case at an associativity of 1.
Figure 2: The overview architecture of Trimma. Designs different from the baseline hybrid memory system are highlighted. The saved capacity from iRT can be flexibly used as extra cache space.
Figure 3: The overall access flow of Trimma. Changes beyond the baseline hybrid memory system are highlighted.
Figure 4: The set-associative memory layout in Trimma, with the access flow actions in \ref{['fig:accessflow']} performed upon. iRT enables some unused metadata blocks to be used as extra cache space (shown in blue in the metadata area).
Figure 5: The indirection-based remap table structure and its lookup flow.
...and 8 more figures

Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

TL;DR

Abstract

Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (13)