Table of Contents
Fetching ...

LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer

Jiapin Wang, Xiangping Zhang, Chenlei Tang, Xiang Chen, Tao Lu

TL;DR

This work tackles onboard DRAM shortages in PCIe devices by proposing LMB, a CXL-based Linked Memory Buffer that extends device memory through a memory expander. The design introduces a kernel-level framework and APIs to enable unified memory allocation and sharing across PCIe and CXL devices, coordinated by a Fabric Manager and a CXL memory pool. Preliminary evaluations on PCIe Gen4 and Gen5 SSDs show that LMB can closely match ideal DRAM performance for writes and sustain substantial gains over traditional DFTL for reads, with some read degradations on Gen5 due to CXL latency. If validated at scale, LMB could significantly improve performance for SSDs, GPUs, and other PCIe devices under AI and large-model workloads by reducing the memory bottleneck without major software or hardware overhauls.

Abstract

PCIe devices, such as SSDs and GPUs, are pivotal in modern data centers, and their value is set to grow amidst the emergence of AI and large models. However, these devices face onboard DRAM shortage issue due to internal space limitation, preventing accommodation of sufficient DRAM modules alongside flash or GPU processing chips. Current solutions either curb device-internal memory usage or supplement slower non-DRAM mediums, prove inadequate or performance-compromising. This paper introduces the Linked Memory Buffer (LMB), a scalable solution utilizing the CXL memory expander to tackle device onboard memory deficiencies. The low-latency of CXL enables LMB to utilize emerging DRAM memory expander to efficiently supplement device onboard DRAM with minimal impact on performance.

LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer

TL;DR

This work tackles onboard DRAM shortages in PCIe devices by proposing LMB, a CXL-based Linked Memory Buffer that extends device memory through a memory expander. The design introduces a kernel-level framework and APIs to enable unified memory allocation and sharing across PCIe and CXL devices, coordinated by a Fabric Manager and a CXL memory pool. Preliminary evaluations on PCIe Gen4 and Gen5 SSDs show that LMB can closely match ideal DRAM performance for writes and sustain substantial gains over traditional DFTL for reads, with some read degradations on Gen5 due to CXL latency. If validated at scale, LMB could significantly improve performance for SSDs, GPUs, and other PCIe devices under AI and large-model workloads by reducing the memory bottleneck without major software or hardware overhauls.

Abstract

PCIe devices, such as SSDs and GPUs, are pivotal in modern data centers, and their value is set to grow amidst the emergence of AI and large models. However, these devices face onboard DRAM shortage issue due to internal space limitation, preventing accommodation of sufficient DRAM modules alongside flash or GPU processing chips. Current solutions either curb device-internal memory usage or supplement slower non-DRAM mediums, prove inadequate or performance-compromising. This paper introduces the Linked Memory Buffer (LMB), a scalable solution utilizing the CXL memory expander to tackle device onboard memory deficiencies. The low-latency of CXL enables LMB to utilize emerging DRAM memory expander to efficiently supplement device onboard DRAM with minimal impact on performance.
Paper Structure (14 sections, 6 figures, 3 tables)

This paper contains 14 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Internal media layout of a commercial SSD. Obviously, there is no room for more DDR in the "Inn".
  • Figure 2: Estimated latency of PCIe Gen5, and CXL devices accessing host and CXL HDM memory sharma2022computeli2023pond.
  • Figure 3: Overall architecture of LMB.
  • Figure 4: Expander address mapping.
  • Figure 5: SSD stores the L2P table through the LMB.
  • ...and 1 more figures