Table of Contents
Fetching ...

Bounds on Agreement between Subjective and Objective Measurements

Jaden Pieper, Stephen D. Voran

Abstract

Objective estimators of multimedia quality are often judged by comparing estimates with subjective "truth data," most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). But subjective test results contain noise, so striving for a PCC of 1.0 or an MSE of 0.0 is neither realistic nor repeatable. Numerous efforts have been made to acknowledge and appropriately accommodate subjective test noise in objective-subjective comparisons, typically resulting in new analysis frameworks and figures-of-merit. We take a different approach. By making only basic assumptions, we derive bounds on PCC and MSE that can be expected for a subjective test. Consistent with intuition, these bounds are functions of subjective vote variance. When a subjective test includes vote variance information, the calculation of the bounds is easy, and in this case we say the resulting bounds are "fully data-driven." We provide two options for calculating bounds in cases where vote variance information is not available. One option is to use vote variance information from other subjective tests that do provide such information, and the second option is to use a model for subjective votes. Thus we introduce a binomial-based model for subjective votes (BinoVotes) that naturally leads to a mean opinion score (MOS) model, named BinoMOS, with multiple unique desirable properties. BinoMOS reproduces the discrete nature of MOS values and its dependence on the number of votes per file. This modeling provides vote variance information required by the PCC and MSE bounds and we compare this modeling with data from 18 subjective tests. The modeling yields PCC and MSE bounds that agree very well with those found from the data directly. These results allow one to set expectations for the PCC and MSE that might be achieved for any subjective test, even those where vote variance information is not available.

Bounds on Agreement between Subjective and Objective Measurements

Abstract

Objective estimators of multimedia quality are often judged by comparing estimates with subjective "truth data," most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). But subjective test results contain noise, so striving for a PCC of 1.0 or an MSE of 0.0 is neither realistic nor repeatable. Numerous efforts have been made to acknowledge and appropriately accommodate subjective test noise in objective-subjective comparisons, typically resulting in new analysis frameworks and figures-of-merit. We take a different approach. By making only basic assumptions, we derive bounds on PCC and MSE that can be expected for a subjective test. Consistent with intuition, these bounds are functions of subjective vote variance. When a subjective test includes vote variance information, the calculation of the bounds is easy, and in this case we say the resulting bounds are "fully data-driven." We provide two options for calculating bounds in cases where vote variance information is not available. One option is to use vote variance information from other subjective tests that do provide such information, and the second option is to use a model for subjective votes. Thus we introduce a binomial-based model for subjective votes (BinoVotes) that naturally leads to a mean opinion score (MOS) model, named BinoMOS, with multiple unique desirable properties. BinoMOS reproduces the discrete nature of MOS values and its dependence on the number of votes per file. This modeling provides vote variance information required by the PCC and MSE bounds and we compare this modeling with data from 18 subjective tests. The modeling yields PCC and MSE bounds that agree very well with those found from the data directly. These results allow one to set expectations for the PCC and MSE that might be achieved for any subjective test, even those where vote variance information is not available.
Paper Structure (15 sections, 50 equations, 6 figures, 2 tables)

This paper contains 15 sections, 50 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Probabilities for the five ratings of the common MOS scale for the BinoVotes model (vs. true quality $Y$) and data (vs. MOS) for 10 tests.
  • Figure 2: Example BinoMOS PMFs (blue) when true quality distribution is $\text{Beta}(2, 2.5)$ (black, scaled for visual comparison). Number of votes per file, $n_v$, increases from left to right with the values 1, 4, and 16.
  • Figure 3: Agreement statistic bounds versus number of votes per file, $n_v$, for four example true quality distributions, $f_Y(y)$. (a) PCC upper bounds. (b) RMSE lower bounds. The Triangular, Beta(2, 2), and Beta(2, 2.5) bound lines are visually indistinguishable.
  • Figure 4: BinoVotes population correlation bounds from (\ref{['eqn:BVPCCbound']}) (dashed black) and BinoVotes sample correlations by simulation (solid colors) vs number of files in sample. Sample correlations are averaged over 10,000 repetitions and this average rapidly approaches the population correlation bound. True quality distribution $f_Y$ is uniform.
  • Figure 5: MOS distributions for 18 subjective tests. Low-resolution gray histograms emphasize distribution shape and high-resolution blue histograms emphasize distribution resolution, as determined by number of votes per file.
  • ...and 1 more figures