Bjøntegaard Delta (BD): A Tutorial Overview of the Metric, Evolution, Challenges, and Recommendations
Nabajeet Barman, Maria G. Martini, Yuriy Reznik
TL;DR
The paper tackles the reliability and usage of the Bjøntegaard Delta (BD) metric, a standard tool for comparing rate–distortion performance of video codecs. It provides a thorough tutorial on BD theory, traces its 20-year evolution, and evaluates multiple implementations under PSNR, SSIM, VMAF, and MOS, including subjective data. Through an experimental study using an open UHD dataset, it shows that cubic versus piecewise-cubic interpolation and the choice of distortion metric can drastically affect reported BD results, especially when RD curves are not well-behaved. The authors offer practical recommendations to improve interpretation, such as using piecewise-cubic fits, reporting BD-Rate alongside BD-Quality, and considering alternative operating ranges or network-aware averaging. The work highlights the need for careful metric design and suggests extensions for learning-based and network-aware scenarios to better reflect real-world codec performance and operating conditions.
Abstract
The Bjøntegaard Delta (BD) method proposed in 2001 has become a popular tool for comparing video codec compression efficiency. It was initially proposed to compute bitrate and quality differences between two Rate-Distortion curves using PSNR as a distortion metric. Over the years, many works have calculated and reported BD results using other objective quality metrics such as SSIM, VMAF and, in some cases, even subjective ratings (mean opinion scores). However, the lack of consolidated literature explaining the metric, its evolution over the years, and a systematic evaluation of the same under different test conditions can result in a wrong interpretation of the BD results thus obtained. Towards this end, this paper presents a detailed tutorial describing the BD method and example cases where the metric might fail. We also provide a detailed history of its evolution, including a discussion of various proposed improvements and variations over the last 20 years. In addition, we evaluate the various BD methods and their open-source implementations, considering different objective quality metrics and subjective ratings taking into account different RD characteristics. Based on our results, we present a set of recommendations on using existing BD metrics and various insights for possible exploration towards developing more effective tools for codec compression efficiency evaluation and comparison.
