Table of Contents
Fetching ...

ARC: DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors

Dhruv Gajaria, Tosiron Adegbija

TL;DR

This work investigates how dynamic voltage and frequency scaling (DVFS) interacts with relaxed-retention STT-RAM caches in multicore processors. It shows that clock frequency and retention time jointly influence cache performance, including expiration misses, and that naive retention choices can be suboptimal under DVFS. To exploit this, the authors design ARC, a DVFS-aware asymmetric-retention core architecture with cores tuned to different retention times and frequency ranges, plus a runtime decision-tree predictor to map applications to the best core. Empirical results demonstrate meaningful energy reductions at both cache and processor levels (up to ~39% cache energy and ~13% processor energy savings) compared to SRAM, and competitive gains versus homogeneous STT-RAM designs, with manageable overheads and scalable potential. The work highlights the potential of retention-time specialization combined with DVFS for energy-efficient multicore STT-RAM caches and suggests avenues for extending ARC to larger, more complex systems.

Abstract

Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS) -- a common optimization in modern processors -- on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design.

ARC: DVFS-Aware Asymmetric-Retention STT-RAM Caches for Energy-Efficient Multicore Processors

TL;DR

This work investigates how dynamic voltage and frequency scaling (DVFS) interacts with relaxed-retention STT-RAM caches in multicore processors. It shows that clock frequency and retention time jointly influence cache performance, including expiration misses, and that naive retention choices can be suboptimal under DVFS. To exploit this, the authors design ARC, a DVFS-aware asymmetric-retention core architecture with cores tuned to different retention times and frequency ranges, plus a runtime decision-tree predictor to map applications to the best core. Empirical results demonstrate meaningful energy reductions at both cache and processor levels (up to ~39% cache energy and ~13% processor energy savings) compared to SRAM, and competitive gains versus homogeneous STT-RAM designs, with manageable overheads and scalable potential. The work highlights the potential of retention-time specialization combined with DVFS for energy-efficient multicore STT-RAM caches and suggests avenues for extending ARC to larger, more complex systems.

Abstract

Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS) -- a common optimization in modern processors -- on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design.
Paper Structure (22 sections, 1 equation, 13 figures, 3 tables)

This paper contains 22 sections, 1 equation, 13 figures, 3 tables.

Figures (13)

  • Figure 1: STT-RAM cell structure. High resistance state is in anti-parallel state and low resistance state is parallel state
  • Figure 2: Impact of frequency scaling to performance and processor energy compared to SRAMs. SRAM caches are faster than STT-RAM caches but consume high energy
  • Figure 3: Illustration of expiration misses. Assume that blocks A and B are in the same memory location, i.e., a write from one block would evict the currently resident block
  • Figure 4: Change in miss rate with respect to frequency. The change in miss rates is observed due to decrease in expiration misses with increase in frequency
  • Figure 5: Decrease in cache miss rate with increase in frequency from 0.8GHz to 2.0GHz for various retention times. We observe specific retention times having high change in cache miss rates with respect to frequency due to variance in cache block lifetimes for different benchmarks
  • ...and 8 more figures