Table of Contents
Fetching ...

Next-Gen Computing Systems with Compute Express Link: a Comprehensive Survey

Chen Chen, Xinkui Zhao, Guanjie Cheng, Yuesheng Xu, Shuiguang Deng, Jianwei Yin

TL;DR

The paper addresses the growing interconnect bottleneck in modern computing systems by surveying Compute Express Link (CXL) based architectures from single-machine memory expansion to distributed shared memory. It organizes research into Memory Expansion and Unified Memory, and extends the discussion to disaggregated systems and DSM enabled by CXL 3.0, grounded by real hardware measurements and diverse simulation platforms. The work catalogs concrete approaches such as tiered memory, near-memory processing, CPU-relay and direct CXL access for unified memory, memory pooling, and shared memory RPC frameworks, and outlines extensive future research directions including memory interleaving, workload-agnostic offloading, GPU memory extension, virtualization, and cross-rack interconnect. The significance lies in providing a structured, up-to-date map of CXL-enabled memory-centric computing, guiding both academia and industry toward scalable, coherent, and flexible data-center infrastructures.

Abstract

Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely limiting the overall performance. To address this challenge, Intel proposes Compute Express Link (CXL), an open industry-standard interconnection. With memory semantics, CXL offers low-latency, scalable, and coherent interconnection between processors and devices. This paper introduces recent advances in CXL-based computing systems from single-machine to distributed. In single-machine systems, we classify existing research into two categories: Memory Expansion and Unified Memory. Memory Expansion focus on processors and memory, aims to address memory wall challenge. Unified memory focus on processors and accelerators, aims to enhance collaboration in heterogeneous computing systems. In distributed systems, we present how to build efficient disaggregation systems based on CXL infrastructure, enabling resource pooling and sharing. Finally, we discuss the future research and envision memory-centric computing with CXL.

Next-Gen Computing Systems with Compute Express Link: a Comprehensive Survey

TL;DR

The paper addresses the growing interconnect bottleneck in modern computing systems by surveying Compute Express Link (CXL) based architectures from single-machine memory expansion to distributed shared memory. It organizes research into Memory Expansion and Unified Memory, and extends the discussion to disaggregated systems and DSM enabled by CXL 3.0, grounded by real hardware measurements and diverse simulation platforms. The work catalogs concrete approaches such as tiered memory, near-memory processing, CPU-relay and direct CXL access for unified memory, memory pooling, and shared memory RPC frameworks, and outlines extensive future research directions including memory interleaving, workload-agnostic offloading, GPU memory extension, virtualization, and cross-rack interconnect. The significance lies in providing a structured, up-to-date map of CXL-enabled memory-centric computing, guiding both academia and industry toward scalable, coherent, and flexible data-center infrastructures.

Abstract

Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely limiting the overall performance. To address this challenge, Intel proposes Compute Express Link (CXL), an open industry-standard interconnection. With memory semantics, CXL offers low-latency, scalable, and coherent interconnection between processors and devices. This paper introduces recent advances in CXL-based computing systems from single-machine to distributed. In single-machine systems, we classify existing research into two categories: Memory Expansion and Unified Memory. Memory Expansion focus on processors and memory, aims to address memory wall challenge. Unified memory focus on processors and accelerators, aims to enhance collaboration in heterogeneous computing systems. In distributed systems, we present how to build efficient disaggregation systems based on CXL infrastructure, enabling resource pooling and sharing. Finally, we discuss the future research and envision memory-centric computing with CXL.
Paper Structure (48 sections, 5 figures, 2 tables)

This paper contains 48 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Three types of CXL devices. The Type 1 device can directly cache host memory, the Type 2 device can directly cache each other's memory with the host, and the host can directly cache memory on a Type 3 device.
  • Figure 2: CXL-based Memory Expansion. It is based on tiered memory, which places data in appropriate memory tiers. In order to avoid the overhead of data movement between tiers, researches introduce near-memory processing.
  • Figure 3: Unified Memory. The hardware cache coherence provided by CXL organises the device-attached memory and host-attached memory into a unified memory space.
  • Figure 4: CXL Based Disaggregated Systemintro_to_cxl. It breaks down the physical isolation between servers and consolidates the data center into a hyper-node through resource pooling.
  • Figure 5: Communication via Shared CXL Memory