My CXL Pool Obviates Your PCIe Switch
Yuhong Zhong, Daniel S. Berger, Pantea Zardoshti, Enrique Saurez, Jacob Nelson, Antonis Psistakis, Joshua Fried, Asaf Cidon
TL;DR
This work addresses the inefficiency and high cost of PCIe switches by proposing a software-based PCIe pooling approach built on CXL memory pools. It introduces a two-part design comprising a datapath that routes PCIe traffic through shared CXL memory and a pooling orchestrator that dynamically allocates devices and migrates workloads, leveraging a cross-host shared-memory channel for fast signaling. The key contributions include empirical observations on memory-pool latency, a cross-host signaling mechanism with sub-microsecond latency, and a practical blueprint for deploying PCIe pooling today without hardware switches. If adopted, the approach could enable disaggregated NICs, accelerators, and other PCIe devices with lower costs and greater flexibility, while complementing RDMA-based storage disaggregation and enabling new datacenter architectures.
Abstract
Pooling PCIe devices across multiple hosts offers a promising solution to mitigate stranded I/O resources, enhance device utilization, address device failures, and reduce total cost of ownership. The only viable option today are PCIe switches, which decouple PCIe devices from hosts by connecting them through a hardware switch. However, the high cost and limited flexibility of PCIe switches hinder their widespread adoption beyond specialized datacenter use cases. This paper argues that PCIe device pooling can be effectively implemented in software using CXL memory pools. CXL memory pools improve memory utilization and already have positive return on investment. We find that, once CXL pools are in place, they can serve as a building block for pooling any kind of PCIe device. We demonstrate that PCIe devices can directly use CXL memory as I/O buffers without device modifications, which enables routing PCIe traffic through CXL pool memory. This software-based approach is deployable on today's hardware and is more flexible than hardware PCIe switches. In particular, we explore how disaggregating devices such as NICs can transform datacenter infrastructure.
