Variable Read Disturbance: An Experimental Analysis of Temporal Variation in DRAM Read Disturbance
Ataberk Olgun, F. Nisa Bostanci, Ismail Emir Yuksel, Oguzhan Canpolat, Haocong Luo, Geraldo F. Oliveira, A. Giray Yaglikci, Minesh Patel, Onur Mutlu
TL;DR
This work addresses the problem that DRAM read disturbance thresholds ($RDT$) may vary over time, undermining the security guarantees of mitigation techniques that assume a fixed $RDT$. It conducts large-scale experiments across 160 DDR4 and 4 HBM2 chips, measuring $RDT$ thousands to hundreds of thousands of times and under varied data patterns, temperatures, and densities. The key findings show pervasive VRD: $RDT$ drifts over time, rows exhibit multiple $RDT$ states, and higher-density/advanced-technology chips worsen the VRD profile, making single-shot or few-shot profiling unreliable. Guardbands combined with ECC could mitigate VRD-induced bitflips but at substantial performance and area overheads, motivating online, runtime profiling and adaptive mitigation strategies as essential directions for robust memory reliability in future systems.
Abstract
Modern DRAM chips are subject to read disturbance errors. State-of-the-art read disturbance mitigations rely on accurate and exhaustive characterization of the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first RowHammer or RowPress bitflip) of every DRAM row (of which there are millions or billions in a modern system) to prevent read disturbance bitflips securely and with low overhead. We experimentally demonstrate for the first time that the RDT of a DRAM row significantly and unpredictably changes over time. We call this new phenomenon variable read disturbance (VRD). Our experiments using 160 DDR4 chips and 4 HBM2 chips from three major manufacturers yield two key observations. First, it is very unlikely that relatively few RDT measurements can accurately identify the RDT of a DRAM row. The minimum RDT of a DRAM row appears after tens of thousands of measurements (e.g., up to 94,467), and the minimum RDT of a DRAM row is 3.5X smaller than the maximum RDT observed for that row. Second, the probability of accurately identifying a row's RDT with a relatively small number of measurements reduces with increasing chip density or smaller technology node size. Our empirical results have implications for the security guarantees of read disturbance mitigation techniques: if the RDT of a DRAM row is not identified accurately, these techniques can easily become insecure. We discuss and evaluate using a guardband for RDT and error-correcting codes for mitigating read disturbance bitflips in the presence of RDTs that change unpredictably over time. We conclude that a >10% guardband for the minimum observed RDT combined with SECDED or Chipkill-like SSC error-correcting codes could prevent read disturbance bitflips at the cost of large read disturbance mitigation performance overheads (e.g., 45% performance loss for an RDT guardband of 50%).
