Table of Contents
Fetching ...

Cross Layer Optimization and Distributed Reinforcement Learning for Wireless 360° Video Streaming

Anis Elgabli, Mohammed S. Elbamby, Cristina Perfecto, Mounssif Krouka, Mehdi Bennis, Vaneet Aggarwal

TL;DR

It is proved that the physical layer subproblem can be solved optimally with low complexity and an actor-critic deep reinforcement learning (DRL) is proposed to leverage the parallel training of multiple independent agents and solve the application layer subproblem.

Abstract

Wirelessly streaming high quality 360 degree videos is still a challenging problem. When there are many users watching different 360 degree videos and competing for the computing and communication resources, the streaming algorithm at hand should maximize the average quality of experience (QoE) while guaranteeing a minimum rate for each user. In this paper, we propose a cross layer optimization approach that maximizes the available rate to each user and efficiently uses it to maximize users' QoE. Particularly, we consider a tile based 360 degree video streaming, and we optimize a QoE metric that balances the tradeoff between maximizing each user's QoE and ensuring fairness among users. We show that the problem can be decoupled into two interrelated subproblems: (i) a physical layer subproblem whose objective is to find the download rate for each user, and (ii) an application layer subproblem whose objective is to use that rate to find a quality decision per tile such that the user's QoE is maximized. We prove that the physical layer subproblem can be solved optimally with low complexity and an actor-critic deep reinforcement learning (DRL) is proposed to leverage the parallel training of multiple independent agents and solve the application layer subproblem. Extensive experiments reveal the robustness of our scheme and demonstrate its significant performance improvement compared to several baseline algorithms.

Cross Layer Optimization and Distributed Reinforcement Learning for Wireless 360° Video Streaming

TL;DR

It is proved that the physical layer subproblem can be solved optimally with low complexity and an actor-critic deep reinforcement learning (DRL) is proposed to leverage the parallel training of multiple independent agents and solve the application layer subproblem.

Abstract

Wirelessly streaming high quality 360 degree videos is still a challenging problem. When there are many users watching different 360 degree videos and competing for the computing and communication resources, the streaming algorithm at hand should maximize the average quality of experience (QoE) while guaranteeing a minimum rate for each user. In this paper, we propose a cross layer optimization approach that maximizes the available rate to each user and efficiently uses it to maximize users' QoE. Particularly, we consider a tile based 360 degree video streaming, and we optimize a QoE metric that balances the tradeoff between maximizing each user's QoE and ensuring fairness among users. We show that the problem can be decoupled into two interrelated subproblems: (i) a physical layer subproblem whose objective is to find the download rate for each user, and (ii) an application layer subproblem whose objective is to use that rate to find a quality decision per tile such that the user's QoE is maximized. We prove that the physical layer subproblem can be solved optimally with low complexity and an actor-critic deep reinforcement learning (DRL) is proposed to leverage the parallel training of multiple independent agents and solve the application layer subproblem. Extensive experiments reveal the robustness of our scheme and demonstrate its significant performance improvement compared to several baseline algorithms.

Paper Structure

This paper contains 12 sections, 23 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: System model representing the transmission period structure (left), user scheduling through SBS cooperation (center), video segmentation into chunks and tiles logic (right).
  • Figure 2: The proposed reinforcement learning based scheduling algorithm for solving the application layer problem.
  • Figure 3: The average reward of the different schemes for $5$ users watching $5$ different 360 degree videos.
  • Figure 4: The CDF of the reward function for $5$ users watching $5$ different 360 degree videos.
  • Figure 5: The CDF of the normalized reward of the $\texttt{PROPOSED}$ scheme with respect to the maximum achievable reward per frame.