Table of Contents
Fetching ...

$R^4$: A Racetrack Register File with Runtime Software Reconfiguration

Christian Hakert, Shuo-Han Chen, Kay Heider, Roland Kühn, Yun-Chih Chen, Jens Teubner, Jian-Jia Chen

TL;DR

The paper tackles the high shift overhead of racetrack memories when used as a CPU register file. It introduces $\mathbb{R}^4$, a reconfigurable register file that alternates between horizontal and vertical allocations, guided by a CFG-informed static analysis and offline recommendations, executed with interrupt-based data migration. The authors develop detailed shift, energy, and latency models for both allocation modes and demonstrate that dynamic reconfiguration can achieve up to $\approx 6\times$ energy reductions and substantially improved latency compared to SRAM in realistic workloads. This work shows that runtime reconfiguration makes racetrack-based registers competitive for CPU use, offering a practical path to energy-efficient, high-endurance memory in future systems.

Abstract

Arising disruptive memory technologies continuously make their way into the memory hierarchy at various levels. Racetrack memory is one promising candidate for future memory due to the overall low energy consumption, access latency and high endurance. However, the access dependent shift property of racetrack memory can make it easily a poor candidate, when the number of shifts is not properly reduced. Therefore, we explore how a register file can be constructed by using non-volatile racetrack memories with a properly reduced number of shifts. Our proposed architecture allows allocating registers in a horizontal or vertical allocation mode, where registers are either scattered across nanotracks or allocated along tracks. In this paper, we propose a dynamic approach, where the allocation can be altered at any access between horizontal and vertical. Control flow graph based static program analysis with simulation-based branch probabilities supplies crucially important recommendations for the dynamic allocation, which are applied at runtime. Experimental evaluation, including a custom gem5 simulation setup, reveals the need for this type of runtime reconfiguration. While the performance in terms of energy consumption, for instance, can be comparably high as SRAM when no runtime reconfiguration is done, the dynamic approach reduces it by up to $\approx 6\times$.

$R^4$: A Racetrack Register File with Runtime Software Reconfiguration

TL;DR

The paper tackles the high shift overhead of racetrack memories when used as a CPU register file. It introduces , a reconfigurable register file that alternates between horizontal and vertical allocations, guided by a CFG-informed static analysis and offline recommendations, executed with interrupt-based data migration. The authors develop detailed shift, energy, and latency models for both allocation modes and demonstrate that dynamic reconfiguration can achieve up to energy reductions and substantially improved latency compared to SRAM in realistic workloads. This work shows that runtime reconfiguration makes racetrack-based registers competitive for CPU use, offering a practical path to energy-efficient, high-endurance memory in future systems.

Abstract

Arising disruptive memory technologies continuously make their way into the memory hierarchy at various levels. Racetrack memory is one promising candidate for future memory due to the overall low energy consumption, access latency and high endurance. However, the access dependent shift property of racetrack memory can make it easily a poor candidate, when the number of shifts is not properly reduced. Therefore, we explore how a register file can be constructed by using non-volatile racetrack memories with a properly reduced number of shifts. Our proposed architecture allows allocating registers in a horizontal or vertical allocation mode, where registers are either scattered across nanotracks or allocated along tracks. In this paper, we propose a dynamic approach, where the allocation can be altered at any access between horizontal and vertical. Control flow graph based static program analysis with simulation-based branch probabilities supplies crucially important recommendations for the dynamic allocation, which are applied at runtime. Experimental evaluation, including a custom gem5 simulation setup, reveals the need for this type of runtime reconfiguration. While the performance in terms of energy consumption, for instance, can be comparably high as SRAM when no runtime reconfiguration is done, the dynamic approach reduces it by up to .

Paper Structure

This paper contains 32 sections, 13 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the Register File Architecture
  • Figure 2: Shift, Energy and Latency of an Intuitive Configuration
  • Figure 3: Varying Number of Access Ports
  • Figure 4: Varying Window Size
  • Figure 5: Varying Number of Nanotracks