Table of Contents
Fetching ...

Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?

Xinyu Hang, Shenpeng Song, Zhimeng Huang, Chuanmin Jia, Siwei Ma, Wen Gao

TL;DR

This work proposes a robust method for high-precision BD estimation across diverse compression scenarios, enhanced by a reliability assessment to determine the probability distribution of BD values from R-D sample points.

Abstract

For decades, the Bjøntegaard Delta (BD) has been the metric for evaluating codec Rate-Distortion (R-D) performance. Yet, in most studies, BD is determined using just 4-5 R-D data points, could this be sufficient? As codecs and quality metrics advance, does the conventional BD estimation still hold up? Crucially, are the performance improvements of new codecs and tools genuine, or merely artifacts of estimation flaws? This paper addresses these concerns by reevaluating BD estimation. We present a novel approach employing a parameterized deep neural network to model R-D curves with high precision across various metrics, accompanied by a comprehensive R-D dataset. This approach both assesses the reliability of BD calculations and serves as a precise BD estimator. Our findings advocate for the adoption of rigorous R-D sampling and reliability metrics in future compression research to ensure the validity and reliability of results.

Rethinking Bjøntegaard Delta for Compression Efficiency Evaluation: Are We Calculating It Precisely and Reliably?

TL;DR

This work proposes a robust method for high-precision BD estimation across diverse compression scenarios, enhanced by a reliability assessment to determine the probability distribution of BD values from R-D sample points.

Abstract

For decades, the Bjøntegaard Delta (BD) has been the metric for evaluating codec Rate-Distortion (R-D) performance. Yet, in most studies, BD is determined using just 4-5 R-D data points, could this be sufficient? As codecs and quality metrics advance, does the conventional BD estimation still hold up? Crucially, are the performance improvements of new codecs and tools genuine, or merely artifacts of estimation flaws? This paper addresses these concerns by reevaluating BD estimation. We present a novel approach employing a parameterized deep neural network to model R-D curves with high precision across various metrics, accompanied by a comprehensive R-D dataset. This approach both assesses the reliability of BD calculations and serves as a precise BD estimator. Our findings advocate for the adoption of rigorous R-D sampling and reliability metrics in future compression research to ensure the validity and reliability of results.

Paper Structure

This paper contains 7 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: (a) Example of BD Estimation. In this example, the true BD-BR (yellow line) is -2.078%, while the generalized PCHIP method (green line) yields -9.77%, resulting in a significant bias of 7.70%. (b) Integral Prediction Network. "FC, $x \times y$" denotes a Fully-Connected layer with $x$ input elements and $y$ output elements.
  • Figure 2: Integral Estimation Example. The integration interval is segmented into distinct segments based on the $X$-values of the sample points. For each segment, integrals are estimated, and these predicted values are subsequently aggregated.
  • Figure 3: Categories of Segments. In total, there are 7 distinct categories of segments, each corresponding to a separate prediction model. The blue areas represent the target integration regions. The red dots and dashed lines indicate the inputs to the corresponding prediction models.
  • Figure 4: BDCI vs. Ground Truth. For visualization purposes, we aligned the centroids of the BDCI and sorted them by the length of the BDCI intervals. The right plot only shows the results for the PSNR, PSNR-HVS, and PSNR-HVS-M metrics because the scales of the other metrics could not be aligned.
  • Figure 5: BDCI vs. R-D Sample Size. The anchor codec is BPG. As the number of R-D sample points increases, the BDCI intervals become narrower.
  • ...and 1 more figures