Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low latency Encoding
Vignesh V Menon, Jingwen Zhu, Prajit T Rajendran, Samira Afzal, Klaus Schoeffmann, Patrick Le Callet, Christian Timmerer
TL;DR
The paper tackles the problem of maintaining low-latency in HTTP adaptive live streaming while maximizing perceptual quality and resource efficiency. It introduces JALE, a JND-aware encoding scheme that jointly predicts per-representation encoder presets and CPU thread counts using content-aware features and a target encoding speed $s_T$, plus a JND-based representation elimination to remove perceptually redundant ladder items using thresholds $v_T$ and $v_J$. JALE uses three components—video complexity feature extraction, joint preset/thread prediction via random forests, and perceptual redundancy elimination—to adapt encoding parameters at segment level. Empirical results show JALE yields an average PSNR gain of $1.32$ dB and VMAF gain of $5.38$ at the same bitrate, along with substantial storage ($72.70\%$), thread ($63.83\%$), and encoding time ($37.87\%$) reductions for a JND of $v_J=6$, demonstrating improved quality and efficiency for live streaming using the $x265$ HEVC encoder on the HLS bitrate ladder.
Abstract
In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70 %, a reduction in the total number of CPU threads used by 63.83 %, and a 37.87 % reduction in the overall encoding time, considering a JND of six VMAF points.
