Performance Characterizations and Usage Guidelines of Samsung CXL Memory Module Hybrid Prototype
Jianping Zeng, Shuyi Pei, Da Zhang, Yuchen Zhou, Amir Beygi, Xuebin Yao, Ramdas Kachare, Tong Zhang, Zongwang Li, Marie Nguyen, Rekha Pitchumani, Yang Soek Ki, Changhee Jung
TL;DR
The paper tackles the memory capacity and persistence gap in data-intensive workloads by evaluating Samsung's CMM-H, a CXL-based memory module that blends a DRAM cache with NAND flash. Through extensive microbenchmarks and workloads spanning volatile and persistent scenarios, it characterizes latency, tail latency, bandwidth, and real-world performance when CMM-H is used as volatile memory, a memory expander, or persistent memory. A key contribution is the demonstration that CMM-H can deliver near-DRAM performance for cache-friendly, limited-footprint workloads and substantial persistence-driven gains for durable services when used with Global Persistent Flush and idempotent processing to avoid heavy WAL logging. The findings offer actionable guidance on workload placement and programming models to exploit CMM-H’s cost-effective memory expansion while balancing latency, bandwidth, and persistence requirements in modern datacenters.
Abstract
The growing prevalence of data-intensive workloads, such as artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), in-memory databases, and real-time analytics, has exposed limitations in conventional memory technologies like DRAM. While DRAM offers low latency and high throughput, it is constrained by high costs, scalability challenges, and volatility, making it less viable for capacity-bound and persistent applications in modern datacenters. Recently, Compute Express Link (CXL) has emerged as a promising alternative, enabling high-speed, cacheline-granular communication between CPUs and external devices. By leveraging CXL technology, NAND flash can now be used as memory expansion, offering three-fold benefits: byte-addressability, scalable capacity, and persistence at a low cost. Samsung's CXL Memory Module Hybrid (CMM-H) is the first product to deliver these benefits through a hardware-only solution, i.e., it does not incur any OS and IO overheads like conventional block devices. In particular, CMM-H integrates a DRAM cache with NAND flash in a single device to deliver near-DRAM latency. This paper presents the first publicly available study for comprehensive characterizations of an FPGA-based CMM-H prototype. Through this study, we address users' concerns about whether a wide variety of applications can successfully run on a memory device backed by NAND flash medium. Additionally, based on these characterizations, we provide key insights into how to best take advantage of the CMM-H device.
