3RSeT: Read Disturbance Rate Reduction in STT-MRAM Caches by Selective Tag Comparison
Elham Cheshmikhani, Hamed Farbeh, Hossein Asad
TL;DR
The paper addresses the reliability challenges of STT-MRAM caches, focusing on read disturbance in the tag array caused by parallel tag reads. It introduces 3RSeT, a selective tag comparison method that first reads a small number of low-order tag bits to prune the set of candidate tag ways before comparing the remaining bits. Through gem5-based evaluation on SPEC workloads, 3RSeT achieves a 71.8% reduction in read disturbance and 3.6x higher MTTF, along with 62.1% tag-energy savings and no performance penalty, with minimal area overhead. This work demonstrates a practical path to reliable, energy-efficient STT-MRAM caches by redesigning tag-array operations to leverage lower-bit locality without sacrificing speed.
Abstract
Recent development in memory technologies has introduced Spin-Transfer Torque Magnetic RAM (STT-MRAM) as the most promising replacement for SRAMs in on-chip cache memories. Besides its lower leakage power, higher density, immunity to radiation-induced particles, and non-volatility, an unintentional bit flip during read operation, referred to as read disturbance error, is a severe reliability challenge in STT-MRAM caches. One major source of read disturbance error in STT-MRAM caches is simultaneous accesses to all tags for parallel comparison operation in a cache set, which has not been addressed in previous work. This paper first demonstrates that high read accesses to tag array extremely increase the read disturbance rate and then proposes a low-cost scheme, so-called Read Disturbance Rate Reduction in STT-MRAM Caches by Selective Tag Comparison (3RSeT), to reduce the error rate by eliminating a significant portion of tag reads. 3RSeT proactively disables the tags that have no chance for hit, using low significant bits of the tags on each access request. Our evaluations using gem5 full-system cycle-accurate simulator show that 3RSeT reduces the read disturbance rate in the tag array by 71.8%, which results in 3.6x improvement in Mean Time To Failure (MTTF). In addition, the energy consumption is reduced by 62.1% without compromising performance and with less than 0.4% area overhead.
