Table of Contents
Fetching ...

Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning

Somdyuti Paul, Andrey Norkin, Alan C. Bovik

TL;DR

This paper tackles the high computational cost of constructing per-shot convex hulls for adaptive video streaming. It introduces RCN-Hull, a Conv-GRU based model that jointly models spatial and temporal video content to predict the hull points as a multi-label binary matrix, trained via a two-step transfer learning scheme on a large lightly compressed dataset (I-CV) and a smaller pristine set (UCV). Ground-truth hulls are built from HEVC encodes across $7$ resolutions and $9$ QP values, forming 63 candidate points per shot, with a reduced set of candidates guided by perceptual quality metrics; the model achieves BD-rate around $0.26\%$ with MAD $0.57\%$, and reduces pre-encoding time by about $53.8\%$ on average. Empirically, RCN-Hull outperforms interpolation, proxy, and handcrafted-feature baselines, delivering near-optimal bitrates while markedly cutting encoding overhead, which has practical impact for scalable, content-aware adaptive streaming. Future work includes extending the approach to additional encoding degrees of freedom and higher-resolution content, further enhancing real-time deployment potential.

Abstract

Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.

Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning

TL;DR

This paper tackles the high computational cost of constructing per-shot convex hulls for adaptive video streaming. It introduces RCN-Hull, a Conv-GRU based model that jointly models spatial and temporal video content to predict the hull points as a multi-label binary matrix, trained via a two-step transfer learning scheme on a large lightly compressed dataset (I-CV) and a smaller pristine set (UCV). Ground-truth hulls are built from HEVC encodes across resolutions and QP values, forming 63 candidate points per shot, with a reduced set of candidates guided by perceptual quality metrics; the model achieves BD-rate around with MAD , and reduces pre-encoding time by about on average. Empirically, RCN-Hull outperforms interpolation, proxy, and handcrafted-feature baselines, delivering near-optimal bitrates while markedly cutting encoding overhead, which has practical impact for scalable, content-aware adaptive streaming. Future work includes extending the approach to additional encoding degrees of freedom and higher-resolution content, further enhancing real-time deployment potential.

Abstract

Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.
Paper Structure (20 sections, 6 equations, 10 figures, 6 tables)

This paper contains 20 sections, 6 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Schematic flow of our per-shot convex hull estimation method.
  • Figure 2: Scatter plots of SI and TI computed on the video shots from the database.
  • Figure 3: An example of a convex hull and its binary matrix representation for multi-label classification.
  • Figure 4: Statistical likelihood of each (resolution, QP) point's inclusion in the convex hulls.
  • Figure 5: Architecture of the RCN-Hull model.
  • ...and 5 more figures