Table of Contents
Fetching ...

Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Lei Wang, Desen Yuan

TL;DR

PSP-IQA tackles the challenge of subjective bias in MOS-based no-reference image quality assessment by incorporating image content through perceptual similarity. It selects subconscious reference images via LPIPS-based nearest-neighbor search and refines subject scores with a similarity-regularized exponential moving average, enabling robust preprocessing when annotations are scarce. Evaluations on LIVE, TID2013, and CID2013 show that PSP-IQA reduces bias and improves downstream IQA performance compared with traditional MOS preprocessing, demonstrating the practical value of leveraging perceptual relationships among images. The approach is lightweight, data-efficient, and applicable to diverse IQA scenarios, offering a principled way to fuse perceptual information with subjective labels without extensive re-annotation.

Abstract

Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account extensive information about the image itself, which limits their performance in less annotated scenarios. Generally speaking, image quality datasets usually contain similar scenes or distortions, and it is inevitable for subjects to compare images to score a reasonable score when scoring. Therefore, In this paper, we proposed Subjective Image Quality Score Preprocessing Method perceptual similarity Subjective Preprocessing (PSP), which exploit the perceptual similarity between images to alleviate subjective bias in less annotated scenarios. Specifically, we model subjective scoring as a conditional probability model based on perceptual similarity with previously scored images, called subconscious reference scoring. The reference images are stored by a neighbor dictionary, which is obtained by a normalized vector dot-product based nearest neighbor search of the images' perceptual depth features. Then the preprocessed score is updated by the exponential moving average (EMA) of the subconscious reference scoring, called similarity regularized EMA. Our experiments on multiple datasets (LIVE, TID2013, CID2013) show that this method can effectively remove the bias of the subjective scores. Additionally, Experiments prove that the Preprocesed dataset can improve the performance of downstream IQA tasks very well.

Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

TL;DR

PSP-IQA tackles the challenge of subjective bias in MOS-based no-reference image quality assessment by incorporating image content through perceptual similarity. It selects subconscious reference images via LPIPS-based nearest-neighbor search and refines subject scores with a similarity-regularized exponential moving average, enabling robust preprocessing when annotations are scarce. Evaluations on LIVE, TID2013, and CID2013 show that PSP-IQA reduces bias and improves downstream IQA performance compared with traditional MOS preprocessing, demonstrating the practical value of leveraging perceptual relationships among images. The approach is lightweight, data-efficient, and applicable to diverse IQA scenarios, offering a principled way to fuse perceptual information with subjective labels without extensive re-annotation.

Abstract

Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account extensive information about the image itself, which limits their performance in less annotated scenarios. Generally speaking, image quality datasets usually contain similar scenes or distortions, and it is inevitable for subjects to compare images to score a reasonable score when scoring. Therefore, In this paper, we proposed Subjective Image Quality Score Preprocessing Method perceptual similarity Subjective Preprocessing (PSP), which exploit the perceptual similarity between images to alleviate subjective bias in less annotated scenarios. Specifically, we model subjective scoring as a conditional probability model based on perceptual similarity with previously scored images, called subconscious reference scoring. The reference images are stored by a neighbor dictionary, which is obtained by a normalized vector dot-product based nearest neighbor search of the images' perceptual depth features. Then the preprocessed score is updated by the exponential moving average (EMA) of the subconscious reference scoring, called similarity regularized EMA. Our experiments on multiple datasets (LIVE, TID2013, CID2013) show that this method can effectively remove the bias of the subjective scores. Additionally, Experiments prove that the Preprocesed dataset can improve the performance of downstream IQA tasks very well.
Paper Structure (13 sections, 6 equations, 4 figures, 4 tables)

This paper contains 13 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Left: traditional MOS model, Gaussian Distribution. Middle: model with annotator information added, Gaussian multimodal distribution. Right: model with image information added, Gaussian multimodal distribution. $u(x)$ is regarded as the true quality. $f(s)$ is a learnable bias of annotator $s$. $u(x^{'})$ is regarded as the true quality of the reference image. $S(x,x^{'})$ represents the score residual converted from the perceptual similarity between images.
  • Figure 2: The overall framework of the proposed method. When the marked score $y_{n}$ is biased large, the perceptual similarity score and the similar image estimation score can get the correct real score, thus correcting $y_{n}$.
  • Figure 3: (a) ResNet-50: Accuracies on LIVE with different variance $\sigma$.(b) ResNet-50: Accuracies on LIVE with different $T$.
  • Figure 4: Given a biased MOS (below the image), our method searches for images with similar perceptual similarities via NSS and gets the finetune score.