Table of Contents
Fetching ...

From Minutes to Seconds: Redefining the Five-Minute Rule for AI-Era Memory Hierarchies

Tong Zhang, Vikram Sharma Mailthody, Fei Sun, Linsen Ma, Chris J. Newburn, Teresa Zhang, Yang Liu, Jiangpeng Li, Hao Zhong, Wen-Mei Hwu

TL;DR

This paper redefines the five-minute rule by incorporating host costs, DRAM bandwidth, and physics-grounded SSD models to produce a constraint- and workload-aware provisioning framework. It demonstrates that GPU-centric hosts paired with Storage-Next SSDs collapse the DRAM-to-flash caching threshold from minutes to seconds, effectively elevating NAND flash to an active memory tier. The authors introduce MQSim-Next, a calibrated SSD simulator, and present two case studies on an SSD-resident KV store and ANN search to illustrate the software design space opened by seconds-scale caching. Collectively, the work provides a practical, cross-layer methodology for provisioning and co-design across devices, hosts, and applications in the AI era.

Abstract

In 1987, Jim Gray and Gianfranco Putzolu introduced the five-minute rule, a simple, storage-memory-economics-based heuristic for deciding when data should live in DRAM rather than on storage. Subsequent revisits to the rule largely retained that economics-only view, leaving host costs, feasibility limits, and workload behavior out of scope. This paper revisits the rule from first principles, integrating host costs, DRAM bandwidth/capacity, and physics-grounded models of SSD performance and cost, and then embedding these elements in a constraint- and workload-aware framework that yields actionable provisioning guidance. We show that, for modern AI platforms, especially GPU-centric hosts paired with ultra-high-IOPS SSDs engineered for fine-grained random access, the DRAM-to-flash caching threshold collapses from minutes to a few seconds. This shift reframes NAND flash memory as an active data tier and exposes a broad research space across the hardware-software stack. We further introduce MQSim-Next, a calibrated SSD simulator that supports validation and sensitivity analysis and facilitates future architectural and system research. Finally, we present two concrete case studies that showcase the software system design space opened by such memory hierarchy paradigm shift. Overall, we turn a classical heuristic into an actionable, feasibility-aware analysis and provisioning framework and set the stage for further research on AI-era memory hierarchy.

From Minutes to Seconds: Redefining the Five-Minute Rule for AI-Era Memory Hierarchies

TL;DR

This paper redefines the five-minute rule by incorporating host costs, DRAM bandwidth, and physics-grounded SSD models to produce a constraint- and workload-aware provisioning framework. It demonstrates that GPU-centric hosts paired with Storage-Next SSDs collapse the DRAM-to-flash caching threshold from minutes to seconds, effectively elevating NAND flash to an active memory tier. The authors introduce MQSim-Next, a calibrated SSD simulator, and present two case studies on an SSD-resident KV store and ANN search to illustrate the software design space opened by seconds-scale caching. Collectively, the work provides a practical, cross-layer methodology for provisioning and co-design across devices, hosts, and applications in the AI era.

Abstract

In 1987, Jim Gray and Gianfranco Putzolu introduced the five-minute rule, a simple, storage-memory-economics-based heuristic for deciding when data should live in DRAM rather than on storage. Subsequent revisits to the rule largely retained that economics-only view, leaving host costs, feasibility limits, and workload behavior out of scope. This paper revisits the rule from first principles, integrating host costs, DRAM bandwidth/capacity, and physics-grounded models of SSD performance and cost, and then embedding these elements in a constraint- and workload-aware framework that yields actionable provisioning guidance. We show that, for modern AI platforms, especially GPU-centric hosts paired with ultra-high-IOPS SSDs engineered for fine-grained random access, the DRAM-to-flash caching threshold collapses from minutes to a few seconds. This shift reframes NAND flash memory as an active data tier and exposes a broad research space across the hardware-software stack. We further introduce MQSim-Next, a calibrated SSD simulator that supports validation and sensitivity analysis and facilitates future architectural and system research. Finally, we present two concrete case studies that showcase the software system design space opened by such memory hierarchy paradigm shift. Overall, we turn a classical heuristic into an actionable, feasibility-aware analysis and provisioning framework and set the stage for further research on AI-era memory hierarchy.

Paper Structure

This paper contains 22 sections, 18 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Simplified system architecture used to derive the new break-even interval formulation.
  • Figure 2: SSD architecture with key parameters for modeling performance and cost.
  • Figure 3: Storage-Next SSD peak IOPS under different configurations and workload read-to-write ratio of 90:10.
  • Figure 4: Break-even interval across configurations. Each stack shows contributions from host processor (CPU/GPU), DRAM, and SSD.
  • Figure 5: (a) and (b): break-even interval under different host processor IOPS capacity without latency constraint; (c) and (d) break-even interval under different tail latency constraints with fixed processor IOPS capacity.
  • ...and 5 more figures