Evaluating Perceptual Distance Models by Fitting Binomial Distributions to Two-Alternative Forced Choice Data
Alexander Hepburn, Raul Santos-Rodriguez, Javier Portilla
TL;DR
The paper addresses the challenge of evaluating perceptual distance models using 2AFC data in large, randomly composed datasets like BAPPS. It introduces a pure probabilistic framework that treats observer decisions as a binomial process with a distance-dependent probability $P(d_0,d_1)$, estimated via kernel-smoothed density estimation and marginal uniformisation, and cross-validated against neural-network baselines. The approach provides simple, interpretable metrics (AJ and NLL) and remains robust to varying numbers of judgements per triplet, yielding performance on par with neural networks but with far fewer parameters and training requirements. Applied to multiple perceptual distances, the method reproduces known rankings and offers richer diagnostics, supporting scalable, principled evaluation of perceptual distance models. The work also demonstrates applicability to datasets with variable $M_t$ (e.g., CLIC) and highlights the practical impact of transparent, likelihood-based evaluation for advancing perceptual similarity metrics.
Abstract
The Two Alternative Forced Choice (2AFC) paradigm offers advantages over the Mean Opinion Score (MOS) paradigm in psychophysics (PF), such as simplicity and robustness. However, when evaluating perceptual distance models, MOS enables direct correlation between model predictions and PF data. In contrast, 2AFC only allows pairwise comparisons to be converted into a quality ranking similar to MOS when comparisons include shared images. In large datasets, like BAPPS, where image patches and distortions are combined randomly, deriving rankings from 2AFC PF data becomes infeasible, as distorted images included in each comparisons are independent. To address this, instead of relying on MOS correlation, researchers have trained ad-hoc neural networks to reproduce 2AFC PF data based on pairs of model distances - a black-box approach with conceptual and operational limitations. This paper introduces a more robust distance-model evaluation method using a pure probabilistic approach, applying maximum likelihood estimation to a binomial decision model. Our method demonstrates superior simplicity, interpretability, flexibility, and computational efficiency, as shown through evaluations of various visual distance models on two 2AFC PF datasets.
