Flow Optimization at Inter-Datacenter Networks for Application Run-time Acceleration
Berta Serracanta, Alberto Rodriguez-Natal, Fabio Maino, Albert Cabellos
TL;DR
This work addresses the WAN bottleneck that inflates short-flow flow completion times in distributed, multi-datacenter deployments. It introduces an eBPF/XDP-based splitter that classifies flows by per-flow packet counts and directs short and long flows onto two identical SD-WAN tunnels, preserving encryption and without requiring changes to applications. Empirical results from a four-router testbed with NextCloud show about a 1.5× reduction in short-flow FCT, with additional improvements in jitter and minimal eBPF overhead, which can be further reduced via NIC offloading. The approach demonstrates a practical, deployment-friendly method to accelerate run-time performance for distributed applications across datacenters and edge environments.
Abstract
In the present-day, distributed applications are commonly spread across multiple datacenters, reaching out to edge and fog computing locations. The transition away from single datacenter hosting is driven by capacity constraints in datacenters and the adoption of hybrid deployment strategies, combining on-premise and public cloud facilities. However, the performance of such applications is often limited by extended Flow Completion Times (FCT) for short flows due to queuing behind bursts of packets from concurrent long flows. To address this challenge, we propose a solution to prioritize short flows over long flows in the Software-Defined Wide-Area Network (SD-WAN) interconnecting the distributed computing platforms. Our solution utilizes eBPF to segregate short and long flows, transmitting them over separate tunnels with the same properties. By effectively mitigating queuing delays, we consistently achieve a 1.5 times reduction in FCT for short flows, resulting in improved application response times. The proposed solution works with encrypted traffic and is application-agnostic, making it deployable in diverse distributed environments without modifying the applications themselves. Our testbed evaluation demonstrates the effectiveness of our approach in accelerating the run-time of distributed applications, providing valuable insights for optimizing multi-datacenter and edge deployments.
