Table of Contents
Fetching ...

Rate-Distortion Optimization for Ensembles of Non-Reference Metrics

Xin Xiong, Samuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega, Neil Birkbeck, Balu Adsumilli

TL;DR

The paper tackles the instability and metric-bias issues that arise when using non-reference metrics (NRMs) for rate-distortion optimization in video codecs. It extends linearized NRM (LNRM) to ensembles of NRMs and introduces a smoothing-based variant (SLNRM) to stabilize gradients, enabling robust cross-metric bitrate savings for hybrid and overfitted codecs. Empirical results on AVC and Cool-chic with YouTube UGC data show consistent BD-rate improvements across diverse NRMs and reduced encoder runtime, especially for Cool-chic where direct NRM optimization is expensive. The method has practical implications for UGC pipelines and can be extended to video coding by incorporating video NRMs, offering encoder-side efficiency without decoder overhead.

Abstract

Non-reference metrics (NRMs) can assess the visual quality of images and videos without a reference, making them well-suited for the evaluation of user-generated content. Nonetheless, rate-distortion optimization (RDO) in video coding is still mainly driven by full-reference metrics, such as the sum of squared errors, which treat the input as an ideal target. A way to incorporate NRMs into RDO is through linearization (LNRM), where the gradient of the NRM with respect to the input guides bit allocation. While this strategy improves the quality predicted by some metrics, we show that it can yield limited gains or degradations when evaluated with other NRMs. We argue that NRMs are highly non-linear predictors with locally unstable gradients that can compromise the quality of the linearization; furthermore, optimizing a single metric may exploit model-specific biases that do not generalize across quality estimators. Motivated by this observation, we extend the LNRM framework to optimize ensembles of NRMs and, to further improve robustness, we introduce a smoothing-based formulation that stabilizes NRM gradients prior to linearization. Our framework is well-suited to hybrid codecs, and we advocate for its use with overfitted codecs, where it avoids iterative evaluations and backpropagation of neural network-based NRMs, reducing encoder complexity relative to direct NRM optimization. We validate the proposed approach on AVC and Cool-chic, using the YouTube UGC dataset. Experiments demonstrate consistent bitrate savings across multiple NRMs with no decoder complexity overhead and, for Cool-chic, a substantial reduction in encoding runtime compared to direct NRM optimization.

Rate-Distortion Optimization for Ensembles of Non-Reference Metrics

TL;DR

The paper tackles the instability and metric-bias issues that arise when using non-reference metrics (NRMs) for rate-distortion optimization in video codecs. It extends linearized NRM (LNRM) to ensembles of NRMs and introduces a smoothing-based variant (SLNRM) to stabilize gradients, enabling robust cross-metric bitrate savings for hybrid and overfitted codecs. Empirical results on AVC and Cool-chic with YouTube UGC data show consistent BD-rate improvements across diverse NRMs and reduced encoder runtime, especially for Cool-chic where direct NRM optimization is expensive. The method has practical implications for UGC pipelines and can be extended to video coding by incorporating video NRMs, offering encoder-side efficiency without decoder overhead.

Abstract

Non-reference metrics (NRMs) can assess the visual quality of images and videos without a reference, making them well-suited for the evaluation of user-generated content. Nonetheless, rate-distortion optimization (RDO) in video coding is still mainly driven by full-reference metrics, such as the sum of squared errors, which treat the input as an ideal target. A way to incorporate NRMs into RDO is through linearization (LNRM), where the gradient of the NRM with respect to the input guides bit allocation. While this strategy improves the quality predicted by some metrics, we show that it can yield limited gains or degradations when evaluated with other NRMs. We argue that NRMs are highly non-linear predictors with locally unstable gradients that can compromise the quality of the linearization; furthermore, optimizing a single metric may exploit model-specific biases that do not generalize across quality estimators. Motivated by this observation, we extend the LNRM framework to optimize ensembles of NRMs and, to further improve robustness, we introduce a smoothing-based formulation that stabilizes NRM gradients prior to linearization. Our framework is well-suited to hybrid codecs, and we advocate for its use with overfitted codecs, where it avoids iterative evaluations and backpropagation of neural network-based NRMs, reducing encoder complexity relative to direct NRM optimization. We validate the proposed approach on AVC and Cool-chic, using the YouTube UGC dataset. Experiments demonstrate consistent bitrate savings across multiple NRMs with no decoder complexity overhead and, for Cool-chic, a substantial reduction in encoding runtime compared to direct NRM optimization.
Paper Structure (11 sections, 9 equations, 6 figures, 4 tables)

This paper contains 11 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) Synthetic UGC obtained by compressing a KODAK image kodak1993kodak using JPEG wallace_jpeg_1991 (with $Q=60$). (b) Cool-chic reconstruction, optimizing for SSE (PSNR: $42.98$ dB). (c) Cool-chic reconstruction, optimizing for QualiCLIP with SSE regularization (PSNR: $42.63$ dB). (d) Normalized scores (between $0$ and $1$) reported by multiple NRMs. While optimizing the bit-allocation for QualiCLIP improves the quality as predicted by some metrics, it yields limited improvements or even degradations in others.
  • Figure 2: (a) MSE and (b) rate evolution across iterations for Cool-chic in one image. While the MSE converges fast, the decrease in rate is more gradual. (c-d) Average NRM score (50 images), showing warm-up reconstructions achieve near-final scores, justifying \ref{['eq:calibration']}.
  • Figure 3: Multidimensional scaling (MDS) kruskal1964multidimensional for NRM correlation. We show $\hbox{$\bf d$}_{i, j} = 1 - | \hbox{$\bf p$}_{i, j}|$, with $\hbox{$\bf p$}_{i, j}$ the 2D projection of the rank correlation between NRMs $i$ and $j$. We identify three clusters.
  • Figure 4: BD-rate savings (%) relative to the baseline across different NRMs. We optimize for the SLNRM (blue bar) and LNRM (red bar) versions of QualiCLIP. Left: Results for AVC. Right: Results for Cool-chic. Optimization with SLNRM yields higher BD-rate savings than LNRM across all NRMs we evaluated with. Moreover, using SLNRM always provides better scores than the baseline regardless of the NRM.
  • Figure 5: Rate-quality curves comparing baseline, LNRM, and SLNRM evaluated across various NRMs. Top: the results for AVC; Bottom: the results for Cool-chic. In each sub-figure, the black curve represents the baseline. Other colors correspond to RDO using specific NRMs, where solid lines indicate LNRM and dashed lines indicate SLNRM. SLNRM consistently outperforms LNRM across various NRMs.
  • ...and 1 more figures