Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming
Zhiqiang He, Zhi Liu
TL;DR
This work tackles plasticity loss in reinforcement learning for adaptive video streaming when QoE weights shift over time. It introduces PA-MoE, a mixture-of-experts policy that injects controlled gradient noise to actively forget outdated knowledge while preserving shared memory, enabling rapid adaptation to nonstationary QoE objectives. The authors establish a regret-bound analysis under standard assumptions and demonstrate in extensive experiments that PA-MoE outperforms traditional MoE and ABR baselines by about 45.5% in dynamic QoE scenarios, while maintaining balanced expert utilization and richer internal representations. The results highlight the practical impact of plasticity-aware learning for robust, real-time adaptation in AVS systems facing diverse user preferences and content types.
Abstract
Adaptive video streaming systems are designed to optimize Quality of Experience (QoE) and, in turn, enhance user satisfaction. However, differences in user profiles and video content lead to different weights for QoE factors, resulting in user-specific QoE functions and, thus, varying optimization objectives. This variability poses significant challenges for neural networks, as they often struggle to generalize under evolving targets - a phenomenon known as plasticity loss that prevents conventional models from adapting effectively to changing optimization objectives. To address this limitation, we propose the Plasticity-Aware Mixture of Experts (PA-MoE), a novel learning framework that dynamically modulates network plasticity by balancing memory retention with selective forgetting. In particular, PA-MoE leverages noise injection to promote the selective forgetting of outdated knowledge, thereby endowing neural networks with enhanced adaptive capabilities. In addition, we present a rigorous theoretical analysis of PA-MoE by deriving a regret bound that quantifies its learning performance. Experimental evaluations demonstrate that PA-MoE achieves a 45.5% improvement in QoE over competitive baselines in dynamic streaming environments. Further analysis reveals that the model effectively mitigates plasticity loss by optimizing neuron utilization. Finally, a parameter sensitivity study is performed by injecting varying levels of noise, and the results align closely with our theoretical predictions.
