Content-Driven Frame-Level Bit Prediction for Rate Control in Versatile Video Coding
Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon, Christian Herglotz
TL;DR
The paper tackles rate control in VVC by replacing traditional analytic rate–QP models with a content-aware, VCA-feature–driven frame-level bit predictor implemented via Random Forest regression. It introduces frame-type–specific models (I, P, B) that leverage lightweight, multi-scale spatial–temporal features to predict per-frame bit consumption from the first pass, enabling a second pass to refine QP using VVenC's R–QP mapping. Empirical results on UHD sequences show strong predictive accuracy ($R^2$ up to 0.93 for I-frames) and competitive BD$_{YUV}$ performance ($-0.14 ext{%}$ on average) with a 33.3% reduction in total encoding time, highlighting improved stability and efficiency. The approach offers a practical path to real-time, energy-efficient encoding in production pipelines and adaptive streaming, without requiring trial encodes. ${R^2}$ values and BD$_{YUV}$ gains demonstrate that content-driven, lightweight features effectively capture bitrate-driving complexity in modern encoders.
Abstract
Rate control allocates bits efficiently across frames to meet a target bitrate while maintaining quality. Conventional two-pass rate control (2pRC) in Versatile Video Coding (VVC) relies on analytical rate-QP models, which often fail to capture nonlinear spatial-temporal variations, causing quality instability and high complexity due to multiple trial encodes. This paper proposes a content-adaptive framework that predicts frame-level bit consumption using lightweight features from the Video Complexity Analyzer (VCA) and quantization parameters within a Random Forest regression. On ultra-high-definition sequences encoded with VVenC, the model achieves strong correlation with ground truth, yielding R2 values of 0.93, 0.88, and 0.77 for I-, P-, and B-frames, respectively. Integrated into a rate-control loop, it achieves comparable coding efficiency to 2pRC while reducing total encoding time by 33.3%. The results show that VCA-driven bit prediction provides a computationally efficient and accurate alternative to conventional rate-QP models.
