Table of Contents
Fetching ...

Pairwise Comparisons Are All You Need

Nicolas Chahine, Sira Ferradans, Jean Ponce

TL;DR

PICNIQ reframes BIQA from predicting absolute quality scores to estimating pairwise image preferences, addressing domain shift and human variability by leveraging sparse, content-aware comparisons. It introduces a Siamese backbone with a symmetry-enforcing hub layer and a weighted binary cross-entropy loss to learn pairwise likelihoods $P^{\theta}(I_i>I_j)$ without assuming a prior on the global quality distribution. Through psychometric scaling (e.g., TrueSkill), these preferences yield granular $JOD$-style quality scores, enabling both pairwise comparisons and scalable quality scoring for large image sets. Experiments on the PIQ23 dataset show competitive performance and strong generalization in both standard and challenging portrait scenarios, suggesting that pairwise, uncertainty-aware BIQA can outperform many traditional regression-based approaches while lowering labeling costs.

Abstract

Blind image quality assessment (BIQA) approaches, while promising for automating image quality evaluation, often fall short in real-world scenarios due to their reliance on a generic quality standard applied uniformly across diverse images. This one-size-fits-all approach overlooks the crucial perceptual relationship between image content and quality, leading to a 'domain shift' challenge where a single quality metric inadequately represents various content types. Furthermore, BIQA techniques typically overlook the inherent differences in the human visual system among different observers. In response to these challenges, this paper introduces PICNIQ, a pairwise comparison framework designed to bypass the limitations of conventional BIQA by emphasizing relative, rather than absolute, quality assessment. PICNIQ is specifically designed to estimate the preference likelihood of quality between image pairs. By employing psychometric scaling algorithms, PICNIQ transforms pairwise comparisons into just-objectionable-difference (JOD) quality scores, offering a granular and interpretable measure of image quality. The proposed framework implements a deep learning architecture in combination with a specialized loss function, and a training strategy optimized for sparse pairwise comparison settings. We conduct our research using comparison matrices from the PIQ23 dataset, which are published in this paper. Our extensive experimental analysis showcases PICNIQ's broad applicability and competitive performance, highlighting its potential to set new standards in the field of BIQA.

Pairwise Comparisons Are All You Need

TL;DR

PICNIQ reframes BIQA from predicting absolute quality scores to estimating pairwise image preferences, addressing domain shift and human variability by leveraging sparse, content-aware comparisons. It introduces a Siamese backbone with a symmetry-enforcing hub layer and a weighted binary cross-entropy loss to learn pairwise likelihoods without assuming a prior on the global quality distribution. Through psychometric scaling (e.g., TrueSkill), these preferences yield granular -style quality scores, enabling both pairwise comparisons and scalable quality scoring for large image sets. Experiments on the PIQ23 dataset show competitive performance and strong generalization in both standard and challenging portrait scenarios, suggesting that pairwise, uncertainty-aware BIQA can outperform many traditional regression-based approaches while lowering labeling costs.

Abstract

Blind image quality assessment (BIQA) approaches, while promising for automating image quality evaluation, often fall short in real-world scenarios due to their reliance on a generic quality standard applied uniformly across diverse images. This one-size-fits-all approach overlooks the crucial perceptual relationship between image content and quality, leading to a 'domain shift' challenge where a single quality metric inadequately represents various content types. Furthermore, BIQA techniques typically overlook the inherent differences in the human visual system among different observers. In response to these challenges, this paper introduces PICNIQ, a pairwise comparison framework designed to bypass the limitations of conventional BIQA by emphasizing relative, rather than absolute, quality assessment. PICNIQ is specifically designed to estimate the preference likelihood of quality between image pairs. By employing psychometric scaling algorithms, PICNIQ transforms pairwise comparisons into just-objectionable-difference (JOD) quality scores, offering a granular and interpretable measure of image quality. The proposed framework implements a deep learning architecture in combination with a specialized loss function, and a training strategy optimized for sparse pairwise comparison settings. We conduct our research using comparison matrices from the PIQ23 dataset, which are published in this paper. Our extensive experimental analysis showcases PICNIQ's broad applicability and competitive performance, highlighting its potential to set new standards in the field of BIQA.
Paper Structure (26 sections, 4 equations, 4 figures, 2 tables)

This paper contains 26 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The PICNIQ architecture for image quality comparison. A Siamese quality-aware backbone is used to process a pair of images and extract relevant image quality features ($B_\theta$). The difference between the extracted features, $V$, is computed and then fed into a fully connected (FC) layer within the hub layer. The hub layer processes both $V$ and its negation and outputs their difference to ensure probabilistic symmetry in the comparison. The predicted probability $P(I>J)$, is obtained by passing the output of the hub layer through a sigmoid function.
  • Figure 2: Comparison matrices from the PIQ23 dataset, indexed by images with sorted JOD scores. These matrices demonstrate varying levels of sparsity and comparison counts.
  • Figure 3: Comparative analysis of IQA models based on the averaged correlation metrics (top - larger is better) and mean absolute error (bottom - smaller is better) across all scenes and for the three attributes of PIQ23. The results showcase the superiority of PICNIQ over previous models in all metrics.
  • Figure 4: The histogram of PICNIQ's predictions on PIQ23 separated by their corresponding ground truth values over 6 bins. We can observe that PICNIQ’s predictions are reasonably aligned and calibrated with the ground truth probability, suggesting that the model is increasingly confident in predicting higher probabilities when they are indeed higher. The red dashed line represents the indecisive line with a probability of $0.5$.