Table of Contents
Fetching ...

Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

Mihai Mazilu, Luca Giacomoni, George Parisis

TL;DR

This work delivers a reproducible, large-scale empirical evaluation of learning-based congestion control against TCP Cubic and BBRv3 using Mininet-based emulation. It systematically examines fairness, backward compatibility, efficiency, responsiveness, and convergence across diverse topologies and dynamic conditions, revealing that while some RL-based CC methods can achieve high bandwidth with low latency within training ranges, generalization to unseen RTT and bandwidth conditions remains limited. Astraea improves fairness through its reward design but generalizes poorly in fairness outside its training range; Orca and Sage exhibit stability and responsiveness issues, and Vivace tends toward instability under many scenarios. The study underscores the need for robust, RTT-varied training regimes and transparent, reproducible benchmarks to drive deployable, fair, and responsive CC policies in real networks.

Abstract

Learning-based congestion control (CC), including Reinforcement-Learning, promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived, static CC algorithms. Learning-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and, eventually, yield deployable solutions for real-world networks. In this paper, we extend our prior work and present a reproducible and systematic study of learning-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We directly contrast said approaches with widely deployed, human-derived CC algorithms, namely TCP Cubic and BBR (version 3). We identify challenges in evaluating learning-based CC, establish a methodology for studying said approaches and perform large-scale experimentation with learning-based CC approaches that are publicly available. We show that embedding fairness directly into reward functions is effective; however, the fairness properties do not generalise into unseen conditions. We then show that RL learning-based approaches existing approaches can acquire all available bandwidth while largely maintaining low latency. Finally, we highlight that existing the latest learning-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change while remaining resistant to non-congestive loss. As with our initial study, our experimentation codebase and datasets are publicly available with the aim to galvanise the research community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.

Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

TL;DR

This work delivers a reproducible, large-scale empirical evaluation of learning-based congestion control against TCP Cubic and BBRv3 using Mininet-based emulation. It systematically examines fairness, backward compatibility, efficiency, responsiveness, and convergence across diverse topologies and dynamic conditions, revealing that while some RL-based CC methods can achieve high bandwidth with low latency within training ranges, generalization to unseen RTT and bandwidth conditions remains limited. Astraea improves fairness through its reward design but generalizes poorly in fairness outside its training range; Orca and Sage exhibit stability and responsiveness issues, and Vivace tends toward instability under many scenarios. The study underscores the need for robust, RTT-varied training regimes and transparent, reproducible benchmarks to drive deployable, fair, and responsive CC policies in real networks.

Abstract

Learning-based congestion control (CC), including Reinforcement-Learning, promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived, static CC algorithms. Learning-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and, eventually, yield deployable solutions for real-world networks. In this paper, we extend our prior work and present a reproducible and systematic study of learning-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We directly contrast said approaches with widely deployed, human-derived CC algorithms, namely TCP Cubic and BBR (version 3). We identify challenges in evaluating learning-based CC, establish a methodology for studying said approaches and perform large-scale experimentation with learning-based CC approaches that are publicly available. We show that embedding fairness directly into reward functions is effective; however, the fairness properties do not generalise into unseen conditions. We then show that RL learning-based approaches existing approaches can acquire all available bandwidth while largely maintaining low latency. Finally, we highlight that existing the latest learning-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change while remaining resistant to non-congestive loss. As with our initial study, our experimentation codebase and datasets are publicly available with the aim to galvanise the research community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.

Paper Structure

This paper contains 16 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Intra-RTT Fairness. Goodput ratio for two competing flows in a dumbbell topology. Bottleneck capacity is 100Mbps, both flows experience the same base RTT (shown on x-axis), buffer capacity is set to $0.2\times$ (a), $1\times$ (b), and $4\times$ (c) the BDP.
  • Figure 2: Intra-RTT Fairness. Congestion window (sending rate for Vivace and BBRv3) for two competing flows in a dumbbell topology. Bottleneck capacity is $100$Mbps, base RTT is $80$ms, buffer capacity is set to $0.2\times$, $1\times$ and $4\times$ the BDP.
  • Figure 3: Inter-RTT Fairness. Goodput ratio for two competing flows in a dumbbell topology. Bottleneck capacity is $100$Mbps and buffer capacity is set to $0.2\times$ (a), $1\times$ (b), and $4\times$ (c) the BDP of the path with the smallest RTT. Flows experience different RTTs; RTT of first flow is set to $20$ms and RTT of second flow is shown on x-axis.
  • Figure 4: Fairness with Bandwidth Variation. Goodput ratio for two competing flows in a dumbbell topology. Base RTT is $40$ms, bottleneck bandwidth varies as shown on the x-axis, buffer capacity is set to $0.2\times$ (a), $1\times$ (b), and $4\times$ (c) the BDP.
  • Figure 5: Fairness in a Parking Lot Topology. Bottleneck capacity is $100$Mbps, all 4 flows experience the same base RTT (shown on x-axis), buffer capacity is set to $0.2\times$ (a), $1\times$ (b), and $4\times$ (c) the BDP.
  • ...and 8 more figures