Table of Contents
Fetching ...

Memory Sharing with CXL: Hardware and Software Design Approaches

Sunita Jain, Nagaradhesh Yeleswarapu, Hasan Al Maruf, Rita Gupta

TL;DR

The paper addresses the challenge of memory sharing in CXL-enabled systems across generations, highlighting the limitations of traditional tightly coupled CPU-memory architectures and the potential of CXL to enable memory pooling and sharing. It surveys software-only approaches (dual-headed topology, a custom framework, and OpenSHMEM-based PGAS) and hardware-assisted methods (dual-headed CXL Type-3 devices with hardware atomics and BI coherence), and discusses a hybrid snoop-filter strategy to balance precision and performance. It also analyzes trade-offs in sharing granularity and security, proposing hardware-assisted isolation and selective, region-based coherence to manage overhead and security risks. The work argues that combining software and hardware design is essential to unlock rack-scale memory sharing and near-data processing capabilities enabled by CXL 3.0, paving the way for Global Integrated Memory and memory-disaggregated architectures.

Abstract

Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding unnecessary data movement. In this paper, we discuss multiple approaches to enable memory sharing with different generations of CXL protocol (i.e., CXL 2.0 and CXL 3.0) considering the challenges with each of the architectures from the device hardware and software viewpoint.

Memory Sharing with CXL: Hardware and Software Design Approaches

TL;DR

The paper addresses the challenge of memory sharing in CXL-enabled systems across generations, highlighting the limitations of traditional tightly coupled CPU-memory architectures and the potential of CXL to enable memory pooling and sharing. It surveys software-only approaches (dual-headed topology, a custom framework, and OpenSHMEM-based PGAS) and hardware-assisted methods (dual-headed CXL Type-3 devices with hardware atomics and BI coherence), and discusses a hybrid snoop-filter strategy to balance precision and performance. It also analyzes trade-offs in sharing granularity and security, proposing hardware-assisted isolation and selective, region-based coherence to manage overhead and security risks. The work argues that combining software and hardware design is essential to unlock rack-scale memory sharing and near-data processing capabilities enabled by CXL 3.0, paving the way for Global Integrated Memory and memory-disaggregated architectures.

Abstract

Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding unnecessary data movement. In this paper, we discuss multiple approaches to enable memory sharing with different generations of CXL protocol (i.e., CXL 2.0 and CXL 3.0) considering the challenges with each of the architectures from the device hardware and software viewpoint.
Paper Structure (9 sections, 5 figures)

This paper contains 9 sections, 5 figures.

Figures (5)

  • Figure 1: Memory sharing with CXL 3.0 (Type-3 HDM-DB/FAM)
  • Figure 2: Dual-headed topology for memory sharing
  • Figure 3: Software stack for custom memory sharing framework
  • Figure 4: Objects in OpenSHMEM framework
  • Figure 5: Memory sharing on a dual-headed system