Streamlining CXL Adoption for Hyperscale Efficiency
Angelos Arelakis, Nilesh Shah, Yiannis Nikolakopoulos, Dimitrios Palyvos-Giannas
TL;DR
This work targets hyperscale memory capacity and TCO challenges by leveraging CXL to create a lossless, hardware-accelerated compressed memory tier aligned with the Open Compute Project's Hyperscale CXL Tiered Memory Expander spec. The authors present an inline IP solution that achieves 2–3× effective memory capacity at 64-byte cache-line granularity, with nanosecond-scale latency and 1.2GHz operation, while supporting LPDDR4/5 and delivering a 20–25% reduction in total cost of ownership. A Proof of Concept combines QEMU-based host emulation with an FPGA-accelerated backend to demonstrate real-time compression/decompression, telemetry, and end-to-end workflow, validating the approach against OCP requirements and illustrating practical viability for production in mid-2024. The paper also outlines a collaborative pathway with the CXL community, including upstream Linux driver work, benchmarking, and partnerships with hyperscalers and device manufacturers to address adoption barriers and accelerate widespread deployment of CXL Tiered Memory Expander solutions.
Abstract
In our exploration of Composable Memory systems utilizing CXL, we focus on overcoming adoption barriers at Hyperscale, underscored by economic models demonstrating Total Cost of Ownership (TCO). While CXL addresses the pressing memory capacity needs of emerging Hyperscale applications, the escalating demands from evolving use cases such as AI outpace the capabilities of current CXL solutions. Hyperscalers resort to software-based memory (de)compression technology, alleviating memory capacity, storage, and network constraints but incurring a notable "Tax" on Compute CPU cycles. As a pivotal guide to the CXL community, Hyperscalers have formulated the groundbreaking Open Compute Project (OCP) Hyperscale CXL Tiered Memory Expander specification. If implemented, this specification lowers TCO adoption barriers, enabling diverse CXL deployments at both Hyperscaler and Enterprise levels. We present a CXL integrated solution, aligning with the aforementioned specification, introducing an energy-efficient, scalable, hardware-accelerated, Lossless Compressed Memory CXL Tier. This solution, slated for mid-2024 production and open for integration with Memory Expander controller manufacturers, offers 2-3X CXL memory compression in nanoseconds, delivering a 20-25% reduction in TCO for end customers without requiring additional physical slots. In our discussion, we pinpoint areas for collaborative innovation within the CXL Community to expedite software/hardware advancements for CXL Tiered Memory Expansion. Furthermore, we delve into unresolved challenges in Pooled deployment and explore potential solutions, collectively aiming to make CXL adoption a "No Brainer" at Hyperscale.
