Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

Fredrik Cumlin

Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

Fredrik Cumlin

TL;DR

ρ-Perfect provides a principled upper bound on model-human correlation for subjectively rated data by decomposing outcome variance under heteroscedastic noise. It defines the ceiling as $\rho$-Perfect = $\sqrt{\frac{\text{Var}(\hat{Y})}{\text{Var}(Y)}}$ where $\hat{Y}=\mathbb{E}[Y|X]$, and validates the squared bound as an estimator of the correlation between two independent subjective evaluations via $\rho$-Perfect^2$. The method is experimentally validated with Split-Raters and Split-Ratings across BVCC, MovieLens, SOMOS, and MERP, showing $\mathbb{E}[\text{Cov}(Y_1,Y_2|X)]\approx0$ and that $\rho$-Perfect^2 tracks true reliability better than conventional ICC in unbalanced settings. A practical case on NISQA with DNSMOS Pro demonstrates that a high $\rho$-Perfect upper bound helps distinguish data reliability from model shortcomings and informs where improvements are needed. The work provides a scalable, interpretable metric to contextualize model performance on subjective datasets and supports more nuanced evaluation in speech, aesthetics, and recommendation domains.

Abstract

Subjective ratings contain inherent noise that limits the model-human correlation, but this reliability issue is rarely quantified. In this paper, we present $ρ$-Perfect, a practical estimation of the highest achievable correlation of a model on subjectively rated datasets. We define $ρ$-Perfect to be the correlation between a perfect predictor and human ratings, and derive an estimate of the value based on heteroscedastic noise scenarios, a common occurrence in subjectively rated datasets. We show that $ρ$-Perfect squared estimates test-retest correlation and use this to validate the estimate. We demonstrate the use of $ρ$-Perfect on a speech quality dataset and show how the measure can distinguish between model limitations and data quality issues.

Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

TL;DR

ρ-Perfect provides a principled upper bound on model-human correlation for subjectively rated data by decomposing outcome variance under heteroscedastic noise. It defines the ceiling as

-Perfect =

where

, and validates the squared bound as an estimator of the correlation between two independent subjective evaluations via

-Perfect^2

\mathbb{E}[\text{Cov}(Y_1,Y_2|X)]\approx0

\rho

\rho$-Perfect upper bound helps distinguish data reliability from model shortcomings and informs where improvements are needed. The work provides a scalable, interpretable metric to contextualize model performance on subjective datasets and supports more nuanced evaluation in speech, aesthetics, and recommendation domains.

Abstract

Subjective ratings contain inherent noise that limits the model-human correlation, but this reliability issue is rarely quantified. In this paper, we present

-Perfect, a practical estimation of the highest achievable correlation of a model on subjectively rated datasets. We define

-Perfect to be the correlation between a perfect predictor and human ratings, and derive an estimate of the value based on heteroscedastic noise scenarios, a common occurrence in subjectively rated datasets. We show that

-Perfect squared estimates test-retest correlation and use this to validate the estimate. We demonstrate the use of

-Perfect on a speech quality dataset and show how the measure can distinguish between model limitations and data quality issues.

Paper Structure (9 sections, 1 theorem, 13 equations, 3 tables)

This paper contains 9 sections, 1 theorem, 13 equations, 3 tables.

Introduction
The $\rho$-Perfect metric
Mathematical Derivation of $\rho$-Perfect
Experimental validation
Validating $\rho$-Perfect on real datasets
Comparison to existing measures
Interpreting model performance with $\rho$-Perfect
Conclusion
Acknowledgement

Key Result

Lemma 2.1

Let $X,Y$ be two random variables and $\hat{Y}=\mathbb{E}[Y\vert X]$. Then the correlation of $\hat{Y}$ and $Y$ is given by

Theorems & Definitions (3)

Lemma 2.1
proof
Definition 2.1: $\rho$-Perfect

Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

TL;DR

Abstract

Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (3)