Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

Mihai Mazilu; Luca Giacomoni; George Parisis

Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

Mihai Mazilu, Luca Giacomoni, George Parisis

TL;DR

This work delivers a reproducible, large-scale empirical evaluation of learning-based congestion control against TCP Cubic and BBRv3 using Mininet-based emulation. It systematically examines fairness, backward compatibility, efficiency, responsiveness, and convergence across diverse topologies and dynamic conditions, revealing that while some RL-based CC methods can achieve high bandwidth with low latency within training ranges, generalization to unseen RTT and bandwidth conditions remains limited. Astraea improves fairness through its reward design but generalizes poorly in fairness outside its training range; Orca and Sage exhibit stability and responsiveness issues, and Vivace tends toward instability under many scenarios. The study underscores the need for robust, RTT-varied training regimes and transparent, reproducible benchmarks to drive deployable, fair, and responsive CC policies in real networks.

Abstract

Learning-based congestion control (CC), including Reinforcement-Learning, promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived, static CC algorithms. Learning-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and, eventually, yield deployable solutions for real-world networks. In this paper, we extend our prior work and present a reproducible and systematic study of learning-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We directly contrast said approaches with widely deployed, human-derived CC algorithms, namely TCP Cubic and BBR (version 3). We identify challenges in evaluating learning-based CC, establish a methodology for studying said approaches and perform large-scale experimentation with learning-based CC approaches that are publicly available. We show that embedding fairness directly into reward functions is effective; however, the fairness properties do not generalise into unseen conditions. We then show that RL learning-based approaches existing approaches can acquire all available bandwidth while largely maintaining low latency. Finally, we highlight that existing the latest learning-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change while remaining resistant to non-congestive loss. As with our initial study, our experimentation codebase and datasets are publicly available with the aim to galvanise the research community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.

Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

TL;DR

Abstract

Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)