Table of Contents
Fetching ...

Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation

Jinhai Yang, Mengxi Guo, Shijie Zhao, Junlin Li, Li Zhang

TL;DR

The paper tackles efficient per-title bitrate ladder construction for adaptive streaming by directly predicting the optimal transcoding resolutions at a fixed set of bitrates, thereby removing the need for pre-encoding. It introduces a one-stage framework built around the Temporal Attentive Gated Recurrent Network (TAGRN), which extracts spatial-temporal features and casts the problem as multi-task classification over bitrates and resolutions. Ground-truth ladders are generated via a two-step encoding process to bound bitrates and enable accurate convex-hull approximation, with focal loss used to handle class imbalance. Empirical results show the approach closely matches the ground-truth convex hulls and significantly reduces encoding overhead, achieving a BD-Rate loss of about $1.21\%$ and a BD-VMAF loss of $-0.2236$, outperforming the fixed DASH ladder in most sequences and enabling practical deployment.

Abstract

Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bjøntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder.

Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation

TL;DR

The paper tackles efficient per-title bitrate ladder construction for adaptive streaming by directly predicting the optimal transcoding resolutions at a fixed set of bitrates, thereby removing the need for pre-encoding. It introduces a one-stage framework built around the Temporal Attentive Gated Recurrent Network (TAGRN), which extracts spatial-temporal features and casts the problem as multi-task classification over bitrates and resolutions. Ground-truth ladders are generated via a two-step encoding process to bound bitrates and enable accurate convex-hull approximation, with focal loss used to handle class imbalance. Empirical results show the approach closely matches the ground-truth convex hulls and significantly reduces encoding overhead, achieving a BD-Rate loss of about and a BD-VMAF loss of , outperforming the fixed DASH ladder in most sequences and enabling practical deployment.

Abstract

Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bjøntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder.
Paper Structure (13 sections, 3 equations, 4 figures, 2 tables)

This paper contains 13 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison between the proposed one-stage bitrate ladder estimation and existing two-stage approaches. Our one-stage method eliminates the overhead of pre-encoding.
  • Figure 2: Example of ground-truth bitrate ladder construction and one-hot representation.
  • Figure 3: Illustration of the proposed TAGRN. The frame-level spatial features are extracted independently and then cross-frame fused via a temporal attention module for prediction.
  • Figure 4: Plots of the RD curves of the predicted bitrate ladders on representative sequences.