Hamun: An Approximate Computation Method to Prolong the Lifespan of ReRAM-Based Accelerators
Mohammad Sabri, Marc Riera, Antonio Gonzalez
TL;DR
ReRAM-based accelerators promise energy-efficient DNN inference but are constrained by limited write endurance. The paper introduces Hamun, an approximate computing framework that extends accelerator lifespan through fault-aware scheduling, wear leveling across PE rows and crossbars, and batch execution to reuse weights, complemented by offline fault-tolerance estimation and runtime reconfiguration only when needed. On ViT, BERT, and GPT-2 benchmarks, Hamun achieves an average lifespan increase of $13.2×$ over a ARAS baseline, with fault handling contributing $4.6×$ and batching $2.6×$, plus additional gains from wear leveling and approximation. The approach also reduces reconfiguration overhead by increasing the number of inferences between reconfigurations, improving practical viability of ReRAM accelerators for long-term DNN inference. Together, these contributions address a key barrier to deployment of ReRAM-based accelerators, though ultimate gains remain bounded by device endurance and hardware constraints.
Abstract
ReRAM-based accelerators exhibit enormous potential to increase computational efficiency for DNN inference tasks, delivering significant performance and energy savings over traditional platforms. By incorporating adaptive scheduling, these accelerators dynamically adjust to DNN requirements, optimizing allocation of constrained hardware resources. However, ReRAM cells have limited endurance cycles due to wear-out from multiple updates for each inference execution, which shortens the lifespan of ReRAM-based accelerators and presents a practical challenge in positioning them as alternatives to conventional platforms like TPUs. Addressing these endurance limitations is essential for making ReRAM-based solutions viable for long-term, high-performance DNN inference. To address the lifespan limitations of ReRAM-based accelerators, we introduce Hamun, an approximate computing method designed to extend the lifespan of ReRAM-based accelerators through a range of optimizations. Hamun incorporates a novel mechanism that detects faulty cell due to wear-out and retires them, avoiding in this way their otherwise adverse impact on DNN accuracy. Moreover, Hamun extends the lifespan of ReRAM-based accelerators by adapting wear-leveling techniques across various abstraction levels of the accelerator and implementing a batch execution scheme to maximize ReRAM cell usage for multiple inferences. On average, evaluated on a set of popular DNNs, Hamun demonstrates an improvement in lifespan of 13.2x over a state-of-the-art baseline. The main contributors to this improvement are the fault handling and batch execution schemes, which provide 4.6x and 2.6x lifespan improvements respectively.
