Reexamining Paradigms of End-to-End Data Movement
Chin Fang, Timothy Stitt, Michael J. McManus, Toshio Moriya
TL;DR
The paper argues that end-to-end data movement is limited by the entire data path, not just network bandwidth, and advocates a holistic, co-designed approach using burst buffers and the ZX data mover. It reexamines six prevalent paradigms—latency sensitivity, packet loss and TCP congestion controls, private testing lines, bandwidth, CPU power, and cloud virtualization—and provides empirical evidence from latency-emulation testbeds and real 100 Gbps links showing that storage I/O, software efficiency, and architectural co-design often dominate performance. A key contribution is the demonstration that integrated data-movement appliances, including DPUs-enabled paths, can achieve near-line-rate transfers even in resource-constrained environments, while cloud paths introduce measurable penalties that can be mitigated by co-designed data paths. The work also presents a reproducible methodology and publicly available testbeds, supporting broader adoption of high-performance data movement across edge, core, and cloud platforms.
Abstract
The pursuit of high-performance data transfer often focuses on raw network bandwidth, and international links of 100 Gbps or higher are frequently considered the primary enabler. While necessary, this network-centric view is incomplete, equating provisioned link speeds with practical, sustainable data movement capabilities across the entire edge-to-core spectrum. This paper investigates six common paradigms, from the often-cited constraints of network latency and TCP congestion control algorithms to host-side factors such as CPU performance and virtualization that critically impact data movement workflows. We validated our findings using a latency-emulation-capable testbed for high-speed WAN performance prediction and through extensive production measurements from resource-constrained edge environments to a 100 Gbps operational link connecting Switzerland and California, U.S. These results show that the principal bottlenecks often reside outside the network core, and that a holistic hardware-software co-design ensures consistent performance, whether moving data at 1 Gbps or 100 Gbps and faster. This approach effectively closes the fidelity gap between benchmark results and diverse and complex production environments.
