$R^4$: A Racetrack Register File with Runtime Software Reconfiguration
Christian Hakert, Shuo-Han Chen, Kay Heider, Roland Kühn, Yun-Chih Chen, Jens Teubner, Jian-Jia Chen
TL;DR
The paper tackles the high shift overhead of racetrack memories when used as a CPU register file. It introduces $\mathbb{R}^4$, a reconfigurable register file that alternates between horizontal and vertical allocations, guided by a CFG-informed static analysis and offline recommendations, executed with interrupt-based data migration. The authors develop detailed shift, energy, and latency models for both allocation modes and demonstrate that dynamic reconfiguration can achieve up to $\approx 6\times$ energy reductions and substantially improved latency compared to SRAM in realistic workloads. This work shows that runtime reconfiguration makes racetrack-based registers competitive for CPU use, offering a practical path to energy-efficient, high-endurance memory in future systems.
Abstract
Arising disruptive memory technologies continuously make their way into the memory hierarchy at various levels. Racetrack memory is one promising candidate for future memory due to the overall low energy consumption, access latency and high endurance. However, the access dependent shift property of racetrack memory can make it easily a poor candidate, when the number of shifts is not properly reduced. Therefore, we explore how a register file can be constructed by using non-volatile racetrack memories with a properly reduced number of shifts. Our proposed architecture allows allocating registers in a horizontal or vertical allocation mode, where registers are either scattered across nanotracks or allocated along tracks. In this paper, we propose a dynamic approach, where the allocation can be altered at any access between horizontal and vertical. Control flow graph based static program analysis with simulation-based branch probabilities supplies crucially important recommendations for the dynamic allocation, which are applied at runtime. Experimental evaluation, including a custom gem5 simulation setup, reveals the need for this type of runtime reconfiguration. While the performance in terms of energy consumption, for instance, can be comparably high as SRAM when no runtime reconfiguration is done, the dynamic approach reduces it by up to $\approx 6\times$.
