Table of Contents
Fetching ...

Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads

Jumin Kim, Seungmin Baek, Minbok Wi, Hwayong Nam, Michael Jaemin Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn

TL;DR

This work addresses the gap between simulator-based PRAC assessments and real hardware performance. It implements PRAC timing changes on current Intel CPUs and validates them with microbenchmarks before measuring SPEC CPU2017 workloads. The key findings show that the average PRAC overhead on real hardware is only $1.06\%$ with a peak of $3.28\%$, which is up to $9.15\times$ lower than prior simulator reports, and that overhead scales with RBMPKI. Importantly, memory-page policy—especially a close policy—significantly mitigates overhead by reducing row-buffer misses, improving overall efficiency and indicating PRAC is practical with proper controller policies.

Abstract

Per-Row Activation Counting (PRAC), a DRAM read disturbance mitigation method, modifies key DRAM timing parameters, reportedly causing significant performance overheads in simulator-based studies. However, given known discrepancies between simulators and real hardware, real-machine experiments are vital for accurate PRAC performance estimation. We present the first real-machine performance analysis of PRAC. After verifying timing modifications on the latest CPUs using microbenchmarks, our analysis shows that PRAC's average and maximum overheads are just 1.06% and 3.28% for the SPEC CPU2017 workloads -- up to 9.15x lower than simulator-based reports. Further, we show that the close page policy minimizes this overhead by effectively hiding the elongated DRAM row precharge operations due to PRAC from the critical path.

Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads

TL;DR

This work addresses the gap between simulator-based PRAC assessments and real hardware performance. It implements PRAC timing changes on current Intel CPUs and validates them with microbenchmarks before measuring SPEC CPU2017 workloads. The key findings show that the average PRAC overhead on real hardware is only with a peak of , which is up to lower than prior simulator reports, and that overhead scales with RBMPKI. Importantly, memory-page policy—especially a close policy—significantly mitigates overhead by reducing row-buffer misses, improving overall efficiency and indicating PRAC is practical with proper controller policies.

Abstract

Per-Row Activation Counting (PRAC), a DRAM read disturbance mitigation method, modifies key DRAM timing parameters, reportedly causing significant performance overheads in simulator-based studies. However, given known discrepancies between simulators and real hardware, real-machine experiments are vital for accurate PRAC performance estimation. We present the first real-machine performance analysis of PRAC. After verifying timing modifications on the latest CPUs using microbenchmarks, our analysis shows that PRAC's average and maximum overheads are just 1.06% and 3.28% for the SPEC CPU2017 workloads -- up to 9.15x lower than simulator-based reports. Further, we show that the close page policy minimizes this overhead by effectively hiding the elongated DRAM row precharge operations due to PRAC from the critical path.

Paper Structure

This paper contains 10 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Timing diagram of DRAM command sequences under the systems DDR5 (a) without PRAC and (b) with PRAC.
  • Figure 2: A sanity check using a microbenchmark on a real-system. (a) is for tRP, and (b) is for tRAS, verifying hardware application of timing changes.
  • Figure 3: $\mathrm{OH}_{\mathrm{prac}}$ across different SPEC CPU2017 workloads spec_cpu_2017, categorized into Low, Mid, and High RBMPKI groups, with average $\mathrm{OH}_{\mathrm{prac}}$ values of 0.06%, 1.03%, and 2.25%, respectively. A strong positive correlation ($\gamma$ = 0.81) between RBMPKI and $\mathrm{OH}_{\mathrm{prac}}$ is observed.
  • Figure 4: Breakdown of different memory access ratio (row-buffer hit, empty, miss) and IPC for selected workloads under open, adaptive, and close. IPC is normalized to open.