Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

Shuoyao Wang; Jiawei Lin; Fangwei Ye

Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

Shuoyao Wang, Jiawei Lin, Fangwei Ye

TL;DR

The paper tackles QoE volatility in adaptive video streaming by introducing ABABR, an imitation-learning-based ABR framework that uses an information bottleneck to compress state representations and mitigate overfitting from offline future information. It builds a high-quality expert via an offline MINLP and applies an alternative optimization routine to speed up demonstration generation, while incorporating a future adversarial information bottleneck (AIB) to suppress leakage through the latent Z. The approach yields a $7.30\%$ average QoE improvement and a $30.01\%$ average ranking reduction across traces, with strong trace-wise stability and faster convergence than DRL methods. By combining optimization theory with probabilistic representation learning, ABABR offers scalable, robust ABR performance in unseen networks and demonstrates the practical value of the digital-twin perspective for network optimization problems.

Abstract

Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel approach that combines imitation learning with the information bottleneck technique, to learn from the complex offline optimal scenario rather than inefficient exploration. In particular, we leverage the deterministic offline bitrate optimization problem with the future throughput realization as the expert and formulate it as a mixed-integer non-linear programming (MINLP) problem. To enable large-scale training for improved performance, we propose an alternative optimization algorithm that efficiently solves the MINLP problem. To address the issues of overfitting due to the future information leakage in MINLP, we incorporate an adversarial information bottleneck framework. By compressing the video streaming state into a latent space, we retain only action-relevant information. Additionally, we introduce a future adversarial term to mitigate the influence of future information leakage, where Model Prediction Control (MPC) policy without any future information is employed as the adverse expert. Experimental results demonstrate the effectiveness of our proposed approach in significantly enhancing the quality of adaptive video streaming, providing a 7.30\% average QoE improvement and a 30.01\% average ranking reduction.

Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

TL;DR

average QoE improvement and a

average ranking reduction across traces, with strong trace-wise stability and faster convergence than DRL methods. By combining optimization theory with probabilistic representation learning, ABABR offers scalable, robust ABR performance in unseen networks and demonstrates the practical value of the digital-twin perspective for network optimization problems.

Abstract

Paper Structure (30 sections, 22 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 22 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Adaptive Video Streaming
Heuristic
Model Predictive Control
Reinforcement Learning
Imitation Learning
Imitation Learning-based ABR
Background and Motivation
Adaptive Video Streaming
Challenges and Motivations
Problem Formulation
Offline Problem Formulation
MDP Formulation
Methodology
...and 15 more sections

Figures (10)

Figure 1: Information Bottleneck-enabled streaming system.
Figure 2: The average QoE of different algorithms under different traces. The red circles indicate the video session that RobustMPC outperforms the RL methods.
Figure 3: An example of various algorithmic policies.
Figure 4: An overview of imitation learning framework. In this paper, Expert is achieved by the offline optimal with the alternative optimization algorithm in Section 5.3. Adverse Expert is achieved by the RobustMPC benchmark 10.1145/2785956.2787486.
Figure 5: Actor Network with Information-Bottleneck.
...and 5 more figures

Theorems & Definitions (1)

Remark 1

Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

TL;DR

Abstract

Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (1)