Table of Contents
Fetching ...

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

Hanqiu Chen, Yitu Wang, Luis Vitorio Cargnini, Mohammadreza Soltaniyeh, Dongyang Li, Gongjin Sun, Pradeep Subedi, Andrew Chang, Yiran Chen, Cong Hao

TL;DR

This paper tackles the memory wall in CXL-enabled memory expansion by introducing ICGMM, a hardware-managed DRAM caching system that uses a two-dimensional Gaussian mixtures-based policy to predict 4KB page accesses. The GMM policy engine operates in hardware on an FPGA, guiding intelligent caching and eviction to improve cache hit rates and reduce SSD latency, all within a dataflow architecture that minimizes overhead. Empirical results on seven benchmarks show cache-miss reductions of 0.32% to 6.14% and average SSD latency reductions of 16.23% to 39.14%, with the GMM approach outperforming LRU and achieving orders-of-magnitude faster inference than LSTM-based policies while using far fewer hardware resources. The work demonstrates that hardware-friendly probabilistic caching can substantially improve memory expansion performance, making CXL-based SSD-backed systems more practical for memory-intensive workloads.

Abstract

Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as substantial cache miss penalties, sub-optimal caching due to data access granularity mismatch between the DRAM "cache" and SSD "memory", and inefficient hardware cache management. To address these issues, we propose a novel solution, named ICGMM, which optimizes caching and eviction directly on hardware, employing a Gaussian Mixture Model (GMM)-based approach. We prototype our solution on an FPGA board, which demonstrates a noteworthy improvement compared to the classic Least Recently Used (LRU) cache strategy. We observe a decrease in the cache miss rate ranging from 0.32% to 6.14%, leading to a substantial 16.23% to 39.14% reduction in the average SSD access latency. Furthermore, when compared to the state-of-the-art Long Short-Term Memory (LSTM)-based cache policies, our GMM algorithm on FPGA showcases an impressive latency reduction of over 10,000 times. Remarkably, this is achieved while demanding much fewer hardware resources.

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

TL;DR

This paper tackles the memory wall in CXL-enabled memory expansion by introducing ICGMM, a hardware-managed DRAM caching system that uses a two-dimensional Gaussian mixtures-based policy to predict 4KB page accesses. The GMM policy engine operates in hardware on an FPGA, guiding intelligent caching and eviction to improve cache hit rates and reduce SSD latency, all within a dataflow architecture that minimizes overhead. Empirical results on seven benchmarks show cache-miss reductions of 0.32% to 6.14% and average SSD latency reductions of 16.23% to 39.14%, with the GMM approach outperforming LRU and achieving orders-of-magnitude faster inference than LSTM-based policies while using far fewer hardware resources. The work demonstrates that hardware-friendly probabilistic caching can substantially improve memory expansion performance, making CXL-based SSD-backed systems more practical for memory-intensive workloads.

Abstract

Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as substantial cache miss penalties, sub-optimal caching due to data access granularity mismatch between the DRAM "cache" and SSD "memory", and inefficient hardware cache management. To address these issues, we propose a novel solution, named ICGMM, which optimizes caching and eviction directly on hardware, employing a Gaussian Mixture Model (GMM)-based approach. We prototype our solution on an FPGA board, which demonstrates a noteworthy improvement compared to the classic Least Recently Used (LRU) cache strategy. We observe a decrease in the cache miss rate ranging from 0.32% to 6.14%, leading to a substantial 16.23% to 39.14% reduction in the average SSD access latency. Furthermore, when compared to the state-of-the-art Long Short-Term Memory (LSTM)-based cache policies, our GMM algorithm on FPGA showcases an impressive latency reduction of over 10,000 times. Remarkably, this is achieved while demanding much fewer hardware resources.
Paper Structure (19 sections, 3 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 3 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: CXL-enabled memory expansion. SSD serves as an extension of host main memory. FPGA DRAM is used as a cache to facilitate memory access to SSD via CXL. FPGA programmable logic is used for intelligent cache management.
  • Figure 2: Memory access spatial distribution (left) and temporal distribution (right) from three benchmarks: (a) dlrm ArchImpl19, (b) parsec bienia2008parsec, and (c) sysbench sysbench. Spatial distribution can be fitted with different Gaussian functions; temporal distribution shows uneven access frequency within a specific range of addresses (see colored annotations).
  • Figure 3: We propose a two-dimensional GMM to capture both spatial and temporal memory access patterns.
  • Figure 4: Intelligent caching and eviction with GMM.
  • Figure 5: ICGMM hardware architecture design with three main modules: cache control engine, cache policy engine, and signal controller. ICGMM is designed as a dataflow architecture with FIFO interfaces between different modules for high parallelism and efficient data-driven control. PA means the physical address.
  • ...and 1 more figures