Table of Contents
Fetching ...

Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming

Zhiqiang He, Zhi Liu

TL;DR

This work tackles plasticity loss in reinforcement learning for adaptive video streaming when QoE weights shift over time. It introduces PA-MoE, a mixture-of-experts policy that injects controlled gradient noise to actively forget outdated knowledge while preserving shared memory, enabling rapid adaptation to nonstationary QoE objectives. The authors establish a regret-bound analysis under standard assumptions and demonstrate in extensive experiments that PA-MoE outperforms traditional MoE and ABR baselines by about 45.5% in dynamic QoE scenarios, while maintaining balanced expert utilization and richer internal representations. The results highlight the practical impact of plasticity-aware learning for robust, real-time adaptation in AVS systems facing diverse user preferences and content types.

Abstract

Adaptive video streaming systems are designed to optimize Quality of Experience (QoE) and, in turn, enhance user satisfaction. However, differences in user profiles and video content lead to different weights for QoE factors, resulting in user-specific QoE functions and, thus, varying optimization objectives. This variability poses significant challenges for neural networks, as they often struggle to generalize under evolving targets - a phenomenon known as plasticity loss that prevents conventional models from adapting effectively to changing optimization objectives. To address this limitation, we propose the Plasticity-Aware Mixture of Experts (PA-MoE), a novel learning framework that dynamically modulates network plasticity by balancing memory retention with selective forgetting. In particular, PA-MoE leverages noise injection to promote the selective forgetting of outdated knowledge, thereby endowing neural networks with enhanced adaptive capabilities. In addition, we present a rigorous theoretical analysis of PA-MoE by deriving a regret bound that quantifies its learning performance. Experimental evaluations demonstrate that PA-MoE achieves a 45.5% improvement in QoE over competitive baselines in dynamic streaming environments. Further analysis reveals that the model effectively mitigates plasticity loss by optimizing neuron utilization. Finally, a parameter sensitivity study is performed by injecting varying levels of noise, and the results align closely with our theoretical predictions.

Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming

TL;DR

This work tackles plasticity loss in reinforcement learning for adaptive video streaming when QoE weights shift over time. It introduces PA-MoE, a mixture-of-experts policy that injects controlled gradient noise to actively forget outdated knowledge while preserving shared memory, enabling rapid adaptation to nonstationary QoE objectives. The authors establish a regret-bound analysis under standard assumptions and demonstrate in extensive experiments that PA-MoE outperforms traditional MoE and ABR baselines by about 45.5% in dynamic QoE scenarios, while maintaining balanced expert utilization and richer internal representations. The results highlight the practical impact of plasticity-aware learning for robust, real-time adaptation in AVS systems facing diverse user preferences and content types.

Abstract

Adaptive video streaming systems are designed to optimize Quality of Experience (QoE) and, in turn, enhance user satisfaction. However, differences in user profiles and video content lead to different weights for QoE factors, resulting in user-specific QoE functions and, thus, varying optimization objectives. This variability poses significant challenges for neural networks, as they often struggle to generalize under evolving targets - a phenomenon known as plasticity loss that prevents conventional models from adapting effectively to changing optimization objectives. To address this limitation, we propose the Plasticity-Aware Mixture of Experts (PA-MoE), a novel learning framework that dynamically modulates network plasticity by balancing memory retention with selective forgetting. In particular, PA-MoE leverages noise injection to promote the selective forgetting of outdated knowledge, thereby endowing neural networks with enhanced adaptive capabilities. In addition, we present a rigorous theoretical analysis of PA-MoE by deriving a regret bound that quantifies its learning performance. Experimental evaluations demonstrate that PA-MoE achieves a 45.5% improvement in QoE over competitive baselines in dynamic streaming environments. Further analysis reveals that the model effectively mitigates plasticity loss by optimizing neuron utilization. Finally, a parameter sensitivity study is performed by injecting varying levels of noise, and the results align closely with our theoretical predictions.

Paper Structure

This paper contains 32 sections, 3 theorems, 39 equations, 20 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Let $L_t:\mathbb{R}^d\to\mathbb{R}$ be an $L$–smooth function; that is, for all $\boldsymbol{w},\boldsymbol{w}'\in\mathbb{R}^d$, Then, for any $\boldsymbol{w}_i^t,\boldsymbol{w}_t^*\in\mathbb{R}^d$, we have

Figures (20)

  • Figure 1: Action output variation across different network architectures under shifted QoE reward conditions. D, L, and N represent distinct QoE metrics. An ideal agent adapts its actions as QoE changes, but the adaptability varies with network design.
  • Figure 2: An illustration of the PA-MoE model.
  • Figure 3: Evolution of neural network dormant neuron rate.
  • Figure 4: Performance Comparison of Different Algorithms Based on QoE Component Metrics
  • Figure 5: Probability distribution of each expert selected for Policy and Value.
  • ...and 15 more figures

Theorems & Definitions (14)

  • Lemma 1: Gradient Co-coercivity Lemma
  • Proof 1
  • Lemma 2: Gradient Strong Convexity Lemma
  • Proof 2
  • Theorem 1: Tracking Error Bound under Nonstationarity
  • Proof 3
  • Proof 4
  • Proof 5
  • Proof 6
  • Proof 7
  • ...and 4 more