TDRAM: Tag-enhanced DRAM for Efficient Caching
Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael Miller, Taeksang Song, Thomas Vogelsang, Steven Woo, Jason Lowe-Power
TL;DR
TDRAM tackles the scalability gap of SRAM caches by designing a tag-enhanced DRAM cache integrated on the same die as data, enabling on-die tag checks and conditional data transfers. It adds an HM bus, ActRd/ActWr commands, and a flush buffer to decouple tag processing from data movement, while supporting early tag probing to reduce miss penalties. The approach yields substantial improvements: around 2.6x faster tag checks, 1.2x system speedup, and about 21% energy savings versus state-of-the-art designs, with robust performance in HPC workloads and favorable behavior in disaggregated memory scenarios. Overall, TDRAM delivers scalable, energy-efficient DRAM caching that narrows the gap between LLC caches and remote memory in heterogeneous memory architectures.
Abstract
As SRAM-based caches are hitting a scaling wall, manufacturers are integrating DRAM-based caches into system designs to continue increasing cache sizes. While DRAM caches can improve the performance of memory systems, existing DRAM cache designs suffer from high miss penalties, wasted data movement, and interference between misses and demand requests. In this paper, we propose TDRAM, a novel DRAM microarchitecture tailored for caching. TDRAM enhances HBM3 by adding a set of small low-latency mats to store tags and metadata on the same die as the data mats. These mats enable fast parallel tag and data access, on-DRAM-die tag comparison, and conditional data response based on comparison result (reducing wasted data transfers) akin to SRAM caches mechanism. TDRAM further optimizes the hit and miss latencies by performing opportunistic early tag probing. Moreover, TDRAM introduces a flush buffer to store conflicting dirty data on write misses, eliminating turnaround delays on data bus. We evaluate TDRAM using a full-system simulator and a set of HPC workloads with large memory footprints showing TDRAM provides at least 2.6$\times$ faster tag check, 1.2$\times$ speedup, and 21% less energy consumption, compared to the state-of-the-art commercial and research designs.
