Table of Contents
Fetching ...

Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs

Kento Sugiura, Manabu Nishimura, Yoshiharu Ishikawa

TL;DR

The paper addresses the challenge of durable, multi-word updates in persistent memory on many-core CPUs by redesigning PMwCAS to remove redundant CAS and cache-flush operations and, in one variant, to drop dirty flags while using PMwCAS descriptors as write-ahead logs for recovery. The authors introduce two PMwCAS algorithms (with and without dirty flags) and implement them in a C++ library, pmem-atomic. Empirical results show up to 10x speedups over the original PMwCAS, with detailed analysis of parameter effects (target words, skew, block size) and guidance for practical usage in persistent data structures. The work demonstrates significant performance gains and provides actionable recommendations for building durable, concurrent data structures on persistent memory systems.

Abstract

In the last decade, academic and industrial researchers have focused on persistent memory because of the development of the first practical product, Intel Optane. One of the main challenges of persistent memory programming is to guarantee consistent durability over separate memory addresses, and Wang et al. proposed a persistent multi-word compare-and-swap (PMwCAS) algorithm to solve this problem. However, their algorithm contains redundant compare-and-swap (CAS) and cache flush instructions and does not achieve sufficient performance on many-core CPUs. This paper proposes a new algorithm to improve performance on many-core CPUs by removing useless CAS/flush instructions from PMwCAS operations. We also exclude dirty flags, which help ensure consistent durability in the original algorithm, from our algorithm using PMwCAS descriptors as write-ahead logs. Experimental results show that the proposed method is up to ten times faster than the original algorithm and suggests several productive uses of PMwCAS operations.

Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs

TL;DR

The paper addresses the challenge of durable, multi-word updates in persistent memory on many-core CPUs by redesigning PMwCAS to remove redundant CAS and cache-flush operations and, in one variant, to drop dirty flags while using PMwCAS descriptors as write-ahead logs for recovery. The authors introduce two PMwCAS algorithms (with and without dirty flags) and implement them in a C++ library, pmem-atomic. Empirical results show up to 10x speedups over the original PMwCAS, with detailed analysis of parameter effects (target words, skew, block size) and guidance for practical usage in persistent data structures. The work demonstrates significant performance gains and provides actionable recommendations for building durable, concurrent data structures on persistent memory systems.

Abstract

In the last decade, academic and industrial researchers have focused on persistent memory because of the development of the first practical product, Intel Optane. One of the main challenges of persistent memory programming is to guarantee consistent durability over separate memory addresses, and Wang et al. proposed a persistent multi-word compare-and-swap (PMwCAS) algorithm to solve this problem. However, their algorithm contains redundant compare-and-swap (CAS) and cache flush instructions and does not achieve sufficient performance on many-core CPUs. This paper proposes a new algorithm to improve performance on many-core CPUs by removing useless CAS/flush instructions from PMwCAS operations. We also exclude dirty flags, which help ensure consistent durability in the original algorithm, from our algorithm using PMwCAS descriptors as write-ahead logs. Experimental results show that the proposed method is up to ten times faster than the original algorithm and suggests several productive uses of PMwCAS operations.
Paper Structure (14 sections, 8 figures, 5 tables, 3 algorithms)

This paper contains 14 sections, 8 figures, 5 tables, 3 algorithms.

Figures (8)

  • Figure 1: Updating a payload pointer in a persistent linked list.
  • Figure 2: Throughput of Wang et al.'s persistent one/three-word CAS operations in high-competitive environments.
  • Figure 3: A state machine of PMwCAS operations with dirty flags.
  • Figure 4: A state machine of PMwCAS operations without dirty flags.
  • Figure 5: Memory blocks for benchmarking PMwCAS operations.
  • ...and 3 more figures