Table of Contents
Fetching ...

Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low latency Encoding

Vignesh V Menon, Jingwen Zhu, Prajit T Rajendran, Samira Afzal, Klaus Schoeffmann, Patrick Le Callet, Christian Timmerer

TL;DR

The paper tackles the problem of maintaining low-latency in HTTP adaptive live streaming while maximizing perceptual quality and resource efficiency. It introduces JALE, a JND-aware encoding scheme that jointly predicts per-representation encoder presets and CPU thread counts using content-aware features and a target encoding speed $s_T$, plus a JND-based representation elimination to remove perceptually redundant ladder items using thresholds $v_T$ and $v_J$. JALE uses three components—video complexity feature extraction, joint preset/thread prediction via random forests, and perceptual redundancy elimination—to adapt encoding parameters at segment level. Empirical results show JALE yields an average PSNR gain of $1.32$ dB and VMAF gain of $5.38$ at the same bitrate, along with substantial storage ($72.70\%$), thread ($63.83\%$), and encoding time ($37.87\%$) reductions for a JND of $v_J=6$, demonstrating improved quality and efficiency for live streaming using the $x265$ HEVC encoder on the HLS bitrate ladder.

Abstract

In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70 %, a reduction in the total number of CPU threads used by 63.83 %, and a 37.87 % reduction in the overall encoding time, considering a JND of six VMAF points.

Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low latency Encoding

TL;DR

The paper tackles the problem of maintaining low-latency in HTTP adaptive live streaming while maximizing perceptual quality and resource efficiency. It introduces JALE, a JND-aware encoding scheme that jointly predicts per-representation encoder presets and CPU thread counts using content-aware features and a target encoding speed , plus a JND-based representation elimination to remove perceptually redundant ladder items using thresholds and . JALE uses three components—video complexity feature extraction, joint preset/thread prediction via random forests, and perceptual redundancy elimination—to adapt encoding parameters at segment level. Empirical results show JALE yields an average PSNR gain of dB and VMAF gain of at the same bitrate, along with substantial storage (), thread (), and encoding time () reductions for a JND of , demonstrating improved quality and efficiency for live streaming using the HEVC encoder on the HLS bitrate ladder.

Abstract

In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70 %, a reduction in the total number of CPU threads used by 63.83 %, and a 37.87 % reduction in the overall encoding time, considering a JND of six VMAF points.
Paper Structure (10 sections, 3 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 10 sections, 3 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: The encoding speed of each representation in HLS bitrate ladder HLS_ladder_ref for the Wood_s000 sequence VCD_ref using ultrafast preset of x265 x265_ref using 4, 8, and 16 CPU threads for each representation.
  • Figure 2: Live encoding using JALE envisioned in this paper.
  • Figure 3: $(\hat{n}, \hat{p})$ look-up table used in the experimental validation of this paper.
  • Figure 4: Results for each representation in JALE. JND-based representation elimination is not considered in these plots.
  • Figure 5: Rate-distortion (RD) curves of representative sequences (segments) for DefaultHLS_ladder_ref encoding (blue line), CAPS ($c=8$) caps_ref encoding (red line), compared to JALE ($v_{\text{J}}$=6).