Rethinking PM Crash Consistency in the CXL Era
João Oliveira, João Gonçalves, Miguel Matos
TL;DR
This paper tackles crash-consistency for persistent memory in the era of Compute Express Link (CXL) by arguing that Optane-era approaches are insufficient in disaggregated, heterogeneous systems. It surveys PM semantics, bug classes, and existing frameworks, then analyzes Global Persistent Flush (GPF) as a CXL-based persistence mechanism alongside its energy and communication challenges. The authors propose three key research directions—memory primitives, persistency-aware frameworks, and bug-detection tools—needed to safely harness CXL PM across multi-host configurations. The work underscores that robust CXL PM demands new primitives, abstractions, and tooling beyond current single-host, eADR-aligned approaches, with practical impact on framework design and fault-tolerance strategies.
Abstract
Persistent Memory (PM) introduces new opportunities for designing crash-consistent applications without the traditional storage overheads. However, ensuring crash consistency in PM demands intricate knowledge of CPU, cache, and memory interactions. Hardware and software mechanisms have been proposed to ease this burden, but neither proved sufficient, prompting a variety of bug detection tools. With the sunset of Intel Optane comes the rise of Compute Express Link (CXL) for PM. In this position paper, we discuss the impact of CXL's disaggregated and heterogeneous nature in the development of crash-consistent PM applications, and outline three research directions: hardware primitives, persistency frameworks, and bug detection tools.
