Offline Meta-learning for Real-time Bandwidth Estimation
Aashish Gottipati, Sami Khairy, Yasaman Hosseinkashi, Gabriel Mittag, Vishak Gopal, Francis Y. Yan, Ross Cutler
TL;DR
Ivy addresses the challenge of variable network conditions and data drift in real-time video bandwidth estimation by introducing an offline meta-learning metapolicy that selects among multiple BWE algorithms to maximize MOS-based QoE. It leverages Implicit Q-Learning on offline telemetry to learn a policy that makes 6-second at-a-time decisions, avoiding live exploration and reducing training data needs. In Microsoft Teams, Ivy improves QoE by up to 11.4% over online QoS meta-heuristics and over 5.9%–11.2% over individual estimators, with up to 28% better data efficiency than online methods. The work demonstrates the practicality of offline meta-learning for production systems facing nonstationarity and data drift and points to future hybrids that combine offline robustness with online adaptation.
Abstract
Real-time video applications require dynamic bitrate adjustments based on network capacity, necessitating accurate bandwidth estimation (BWE). We introduce Ivy, a novel BWE method that leverages offline meta-learning to combat data drift and maximize user Quality of Experience (QoE). Our approach dynamically selects the most suitable BWE algorithm for current network conditions, enabling effective adaptation to changing environments without requiring live network interactions. We implemented our method in Microsoft Teams and demonstrated that Ivy can enhance QoE by 5.9% to 11.2% over individual BWE algorithms and by 6.3% to 11.4% compared to existing online meta heuristics. Additionally, we show that our method is more data efficient compared to online meta-learning methods, achieving up to 21% improvement in QoE while requiring significantly less training data.
