Towards Disaggregation-Native Data Streaming between Devices
Nils Asmussen, Michael Roitzsch
TL;DR
The paper tackles data-movement bottlenecks in disaggregated datacenters enabled by fabrics such as CXL, where CPU-centric staging can negate potential latency benefits. It proposes disaggregation-native devices carrying a device-independent data streaming facility and analyzes three protocol-placement strategies, advocating distributed resource-side protocols to minimize hops. Using the M3 architecture with DTU-enabled tiles, it outlines architectural components, device heterogeneity considerations, access control, and protocol implementation approaches, and discusses cross-machine extensions as a core challenge. A gem5-based evaluation demonstrates substantial latency improvements for distributed protocols (up to ~67% faster than app-side and ~25% faster than central), while acknowledging simulation limitations and open questions about mapping security primitives onto CXL fabrics for robust isolation.
Abstract
Disaggregation is an ongoing trend to increase flexibility in datacenters. With interconnect technologies like CXL, pools of CPUs, accelerators, and memory can be connected via a datacenter fabric. Applications can then pick from those pools the resources necessary for their specific workload. However, this vision becomes less clear when we consider data movement. Workloads often require data to be streamed through chains of multiple devices, but typically, these data streams physically do not directly flow device-to-device, but are staged in memory by a CPU hosting device protocol logic. We show that augmenting devices with a disaggregation-native and device-independent data streaming facility can improve processing latencies by enabling data flows directly between arbitrary devices.
