Table of Contents
Fetching ...

ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

Sami Khairy, Gabriel Mittag, Vishak Gopal, Francis Y. Yan, Zhixiong Niu, Ezra Ameri, Scott Inglis, Mehrsa Golestaneh, Ross Cutler

TL;DR

This work tackles bandwidth estimation for real-time communications by shifting from simulation-based RL to offline RL trained on real-world Microsoft Teams data, using objective audio/video quality signals as rewards to align with user QoE. A two-dataset setup provides real-world trajectories and emulated ground-truth dynamics, enabling controlled offline learning via Implicit Q-Learning. Through a two-stage evaluation—emulation-based screening and a geographically distributed testbed—the study shows that offline RL can produce competitive bandwidth estimators, with the Schaferct model achieving top performance and comparable QoE to strong baselines. The results underscore the importance of user-centric rewards and real-world data in designing robust RTC bandwidth estimators suitable for diverse network conditions. The findings contribute to practical bandwidth estimation strategies that improve QoE while mitigating the sim-to-real gap.

Abstract

The quality of experience (QoE) delivered by video conferencing systems to end users depends in part on correctly estimating the capacity of the bottleneck link between the sender and the receiver over time. Bandwidth estimation for real-time communications (RTC) remains a significant challenge, primarily due to the continuously evolving heterogeneous network architectures and technologies. From the first bandwidth estimation challenge which was hosted at ACM MMSys 2021, we learned that bandwidth estimation models trained with reinforcement learning (RL) in simulations to maximize network-based reward functions may not be optimal in reality due to the sim-to-real gap and the difficulty of aligning network-based rewards with user-perceived QoE. This grand challenge aims to advance bandwidth estimation model design by aligning reward maximization with user-perceived QoE optimization using offline RL and a real-world dataset with objective rewards which have high correlations with subjective audio/video quality in Microsoft Teams. All models submitted to the grand challenge underwent initial evaluation on our emulation platform. For a comprehensive evaluation under diverse network conditions with temporal fluctuations, top models were further evaluated on our geographically distributed testbed by using each model to conduct 600 calls within a 12-day period. The winning model is shown to deliver comparable performance to the top behavior policy in the released dataset. By leveraging real-world data and integrating objective audio/video quality scores as rewards, offline RL can therefore facilitate the development of competitive bandwidth estimators for RTC.

ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

TL;DR

This work tackles bandwidth estimation for real-time communications by shifting from simulation-based RL to offline RL trained on real-world Microsoft Teams data, using objective audio/video quality signals as rewards to align with user QoE. A two-dataset setup provides real-world trajectories and emulated ground-truth dynamics, enabling controlled offline learning via Implicit Q-Learning. Through a two-stage evaluation—emulation-based screening and a geographically distributed testbed—the study shows that offline RL can produce competitive bandwidth estimators, with the Schaferct model achieving top performance and comparable QoE to strong baselines. The results underscore the importance of user-centric rewards and real-world data in designing robust RTC bandwidth estimators suitable for diverse network conditions. The findings contribute to practical bandwidth estimation strategies that improve QoE while mitigating the sim-to-real gap.

Abstract

The quality of experience (QoE) delivered by video conferencing systems to end users depends in part on correctly estimating the capacity of the bottleneck link between the sender and the receiver over time. Bandwidth estimation for real-time communications (RTC) remains a significant challenge, primarily due to the continuously evolving heterogeneous network architectures and technologies. From the first bandwidth estimation challenge which was hosted at ACM MMSys 2021, we learned that bandwidth estimation models trained with reinforcement learning (RL) in simulations to maximize network-based reward functions may not be optimal in reality due to the sim-to-real gap and the difficulty of aligning network-based rewards with user-perceived QoE. This grand challenge aims to advance bandwidth estimation model design by aligning reward maximization with user-perceived QoE optimization using offline RL and a real-world dataset with objective rewards which have high correlations with subjective audio/video quality in Microsoft Teams. All models submitted to the grand challenge underwent initial evaluation on our emulation platform. For a comprehensive evaluation under diverse network conditions with temporal fluctuations, top models were further evaluated on our geographically distributed testbed by using each model to conduct 600 calls within a 12-day period. The winning model is shown to deliver comparable performance to the top behavior policy in the released dataset. By leveraging real-world data and integrating objective audio/video quality scores as rewards, offline RL can therefore facilitate the development of competitive bandwidth estimators for RTC.
Paper Structure (10 sections, 5 equations, 1 figure, 4 tables)

This paper contains 10 sections, 5 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Performance results of top models on the testbed. Each box is based on $1200$ data points. Lower and upper whiskers represent the $10$th and $90$th percentiles, respectively. Model set includes the baseline policy as well as the top behaviour policy in the released datasets (v1). The winning model, Schaferct, demonstrates comparable performance to the best behavior policy (v1) in the released datasets across all metrics.