Table of Contents
Fetching ...

MANSY: Generalizing Neural Adaptive Immersive Video Streaming With Ensemble and Representation Learning

Duo Wu, Panlong Wu, Miao Zhang, Fangxin Wang

TL;DR

MANSY, a novel streaming system that embraces user diversity to improve generalization, is proposed, which for the first time combines the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences.

Abstract

The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their performance can significantly deteriorate when users' actual viewing patterns and QoE preferences differ considerably from those observed during the training phase, resulting in poor generalization. In this paper, we propose MANSY, a novel streaming system that embraces user diversity to improve generalization. Specifically, to accommodate users' diverse viewing patterns, we design a Transformer-based viewport prediction model with an efficient multi-viewport trajectory input output architecture based on implicit ensemble learning. Besides, we for the first time combine the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences. Extensive experiments demonstrate that MANSY outperforms state-of-the-art approaches in viewport prediction accuracy and QoE improvement on both trained and unseen viewing patterns and QoE preferences, achieving better generalization.

MANSY: Generalizing Neural Adaptive Immersive Video Streaming With Ensemble and Representation Learning

TL;DR

MANSY, a novel streaming system that embraces user diversity to improve generalization, is proposed, which for the first time combines the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences.

Abstract

The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their performance can significantly deteriorate when users' actual viewing patterns and QoE preferences differ considerably from those observed during the training phase, resulting in poor generalization. In this paper, we propose MANSY, a novel streaming system that embraces user diversity to improve generalization. Specifically, to accommodate users' diverse viewing patterns, we design a Transformer-based viewport prediction model with an efficient multi-viewport trajectory input output architecture based on implicit ensemble learning. Besides, we for the first time combine the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences. Extensive experiments demonstrate that MANSY outperforms state-of-the-art approaches in viewport prediction accuracy and QoE improvement on both trained and unseen viewing patterns and QoE preferences, achieving better generalization.
Paper Structure (25 sections, 14 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 14 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Viewport prediction accuracy on two set of users with different viewing patterns.
  • Figure 2: Performance of bitrate selection methods on three different QoE preferences.
  • Figure 3: System framework of the proposed tile-based immersive video streaming system MANSY.
  • Figure 4: The architecture of the proposed MTIO-Transformer viewport prediction model.
  • Figure 5: Illustrations of the proposed RepL-based learning framework. Note that the reward for training the agent is composed of two parts, which are derived from the QoE score and outputs of QoE identifier.
  • ...and 7 more figures