ARC: DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors
Dhruv Gajaria, Tosiron Adegbija
TL;DR
This work investigates how dynamic voltage and frequency scaling (DVFS) interacts with relaxed-retention STT-RAM caches in multicore processors. It shows that clock frequency and retention time jointly influence cache performance, including expiration misses, and that naive retention choices can be suboptimal under DVFS. To exploit this, the authors design ARC, a DVFS-aware asymmetric-retention core architecture with cores tuned to different retention times and frequency ranges, plus a runtime decision-tree predictor to map applications to the best core. Empirical results demonstrate meaningful energy reductions at both cache and processor levels (up to ~39% cache energy and ~13% processor energy savings) compared to SRAM, and competitive gains versus homogeneous STT-RAM designs, with manageable overheads and scalable potential. The work highlights the potential of retention-time specialization combined with DVFS for energy-efficient multicore STT-RAM caches and suggests avenues for extending ARC to larger, more complex systems.
Abstract
Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS) -- a common optimization in modern processors -- on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design.
