CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

Dongsuk Oh; Miryeong Kwon; Jiseon Kim; Eunjee Na; Junseok Moon; Hyunkyu Choi; Seonghyeon Jang; Hanjin Choi; Hongjoo Jung; Sangwon Lee; Myoungsoo Jung

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

Dongsuk Oh, Miryeong Kwon, Jiseon Kim, Eunjee Na, Junseok Moon, Hyunkyu Choi, Seonghyeon Jang, Hanjin Choi, Hongjoo Jung, Sangwon Lee, Myoungsoo Jung

TL;DR

This work targets the latency gap between DRAM and SCM-backed CXL-SSDs by offloading LLC prefetching from the CPU to the CXL expander network, leveraging CXL.mem back-invalidation for coherence. It introduces ExPAND, an architecture with a host-side reflector and an expander-side decider that uses a heterogeneous ML-based address predictor and a timing predictor to generate timely prefetches, while computing end-to-end prefetch timeliness from CXL topology data. The paper demonstrates that ExPAND can substantially outperform baseline prefetchers, achieving up to $9.0\times$ speedups over NoPrefetch for graph workloads and $14.7\times$ for SPEC CPU benchmarks, with strong gains when backend media latency is favorable. The combination of topology-aware timing, bidirectional CXL communication, and ML-assisted addressing enables data to be brought closer to the host LLC more efficiently, reducing reliance on CXL-SSDs and improving real-world performance for memory-disaggregated systems.

Abstract

Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch timeliness for accurate latency estimation. ExPAND, being aware of CXL multi-tiered switching, provides end-to-end latency for each CXL-SSD and precise prefetch timeliness estimations. Our method reduces CXL-SSD reliance and enables direct host cache access for most data. ExPAND enhances graph application performance and SPEC CPU's performance by 9.0$\times$ and 14.7$\times$, respectively, surpassing CXL-SSD pools with diverse prefetching strategies.

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

TL;DR

Abstract

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)