Table of Contents
Fetching ...

FairStream: Fair Multimedia Streaming Benchmark for Reinforcement Learning Agents

Jannis Weil, Jonas Ringsdorf, Julian Barthel, Yi-Ping Phoebe Chen, Tobias Meuser

TL;DR

This work proposes a novel multi-agent environment that comprises multiple challenges of fair multimedia streaming: partial observability, multiple objectives, agent heterogeneity and asynchronicity, and shows that the commonly used Proximal Policy Optimization algorithm is outperformed by a simple greedy heuristic.

Abstract

Multimedia streaming accounts for the majority of traffic in today's internet. Mechanisms like adaptive bitrate streaming control the bitrate of a stream based on the estimated bandwidth, ideally resulting in smooth playback and a good Quality of Experience (QoE). However, selecting the optimal bitrate is challenging under volatile network conditions. This motivated researchers to train Reinforcement Learning (RL) agents for multimedia streaming. The considered training environments are often simplified, leading to promising results with limited applicability. Additionally, the QoE fairness across multiple streams is seldom considered by recent RL approaches. With this work, we propose a novel multi-agent environment that comprises multiple challenges of fair multimedia streaming: partial observability, multiple objectives, agent heterogeneity and asynchronicity. We provide and analyze baseline approaches across five different traffic classes to gain detailed insights into the behavior of the considered agents, and show that the commonly used Proximal Policy Optimization (PPO) algorithm is outperformed by a simple greedy heuristic. Future work includes the adaptation of multi-agent RL algorithms and further expansions of the environment.

FairStream: Fair Multimedia Streaming Benchmark for Reinforcement Learning Agents

TL;DR

This work proposes a novel multi-agent environment that comprises multiple challenges of fair multimedia streaming: partial observability, multiple objectives, agent heterogeneity and asynchronicity, and shows that the commonly used Proximal Policy Optimization algorithm is outperformed by a simple greedy heuristic.

Abstract

Multimedia streaming accounts for the majority of traffic in today's internet. Mechanisms like adaptive bitrate streaming control the bitrate of a stream based on the estimated bandwidth, ideally resulting in smooth playback and a good Quality of Experience (QoE). However, selecting the optimal bitrate is challenging under volatile network conditions. This motivated researchers to train Reinforcement Learning (RL) agents for multimedia streaming. The considered training environments are often simplified, leading to promising results with limited applicability. Additionally, the QoE fairness across multiple streams is seldom considered by recent RL approaches. With this work, we propose a novel multi-agent environment that comprises multiple challenges of fair multimedia streaming: partial observability, multiple objectives, agent heterogeneity and asynchronicity. We provide and analyze baseline approaches across five different traffic classes to gain detailed insights into the behavior of the considered agents, and show that the commonly used Proximal Policy Optimization (PPO) algorithm is outperformed by a simple greedy heuristic. Future work includes the adaptation of multi-agent RL algorithms and further expansions of the environment.

Paper Structure

This paper contains 33 sections, 9 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Streaming scenario with two exemplary clients. Subfigure A shows that all clients share a time-varying bottleneck link. Subfigure B depicts the asynchronous download of segments. Each rectangle represents a segment $s^i_t$ with bitrate $b^i_t$. The bottom graph shows that the total bandwidth $\text{bw}_\text{total}$ of the bottleneck is shared across all downloading clients. Subfigure C shows that the qoe of each client depends on a client-specific function that maps the bitrate of a segment to a perceptual quality. To compute the fairness, the qoe of all streaming clients is considered.
  • Figure 2: Bitrates and corresponding perceptual qualities for the four considered client types. The different slopes indicate different resource requirements.
  • Figure 3: Mean bandwidth a) and cv b) of all traces, as well as the mean bandwidth of the traces of our dataset c). Subplot b) in the center shows traces with a cv in $[0, 1]$, representing $99.6\%$ of all data. Traces with a cv greater than $1$ are very infrequent and would not be visible in this histogram.
  • Figure 4: Overview of all feasible solutions (grey transparent) and pareto-optimal solutions (colored) of the time-independent formulation. Optimal solutions are connected by a line according to their ordered bitrate.
  • Figure 5: Optimal solutions for the time-independent formulation with four clients Phone, HDTV, 4KTV and PCV using different quality-fairness coefficients $\alpha = 0.25$ (left), $\alpha = 0.5$ (center), and $\alpha = 0.75$ (right). The top plots show the bitrate of each client, given the bandwidth according to the horizontal axis. The fairness between all qualities is depicted in the center. The bottom plots show the quality of each client. For higher $\alpha$, clients prioritize quality over fairness at the cost of more frequent bitrate and quality changes.
  • ...and 9 more figures