Table of Contents
Fetching ...

Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

Baoliang Chen, Siyi Pan, Dongxu Wu, Liang Xie, Xiangjie Sui, Lingyu Zhu, Hanwei Zhu

TL;DR

This work tackles image quality assessment with large multimodal models by identifying a perception bias that makes IQA rely on semantics rather than low-level cues. It introduces a training-free debiasing framework with two steps: bias exposure, generating semantic-preserving degradations (e.g., zoom blur, spatter, saturation, fog), and bias mitigation, using conditional prompts to rate the query image under the degraded condition. Quality aggregation is performed via a conditional probability model that weights multiple degraded views by semantic similarity, implemented with prompts that infer same-object semantics. Across five IQA datasets and several LMMs, the method consistently improves correlation with human judgments, demonstrating strong generalization and the potential of prompt-based bias correction for unseen tasks.

Abstract

Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LMMs are primarily trained for high-level tasks (e.g., image captioning), emphasizing unified image semantics extraction under varied quality. Such semantic-aware yet quality-insensitive perception bias inevitably leads to a heavy reliance on image semantics when those LMMs are forced for quality rating. In this paper, instead of retraining or tuning an LMM costly, we propose a training-free debiasing framework, in which the image quality prediction is rectified by mitigating the bias caused by image semantics. Specifically, we first explore several semantic-preserving distortions that can significantly degrade image quality while maintaining identifiable semantics. By applying these specific distortions to the query or test images, we ensure that the degraded images are recognized as poor quality while their semantics mainly remain. During quality inference, both a query image and its corresponding degraded version are fed to the LMM along with a prompt indicating that the query image quality should be inferred under the condition that the degraded one is deemed poor quality. This prior condition effectively aligns the LMM's quality perception, as all degraded images are consistently rated as poor quality, regardless of their semantic variance. Finally, the quality scores of the query image inferred under different prior conditions (degraded versions) are aggregated using a conditional probability model. Extensive experiments on various IQA datasets show that our debiasing framework could consistently enhance the LMM performance.

Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

TL;DR

This work tackles image quality assessment with large multimodal models by identifying a perception bias that makes IQA rely on semantics rather than low-level cues. It introduces a training-free debiasing framework with two steps: bias exposure, generating semantic-preserving degradations (e.g., zoom blur, spatter, saturation, fog), and bias mitigation, using conditional prompts to rate the query image under the degraded condition. Quality aggregation is performed via a conditional probability model that weights multiple degraded views by semantic similarity, implemented with prompts that infer same-object semantics. Across five IQA datasets and several LMMs, the method consistently improves correlation with human judgments, demonstrating strong generalization and the potential of prompt-based bias correction for unseen tasks.

Abstract

Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LMMs are primarily trained for high-level tasks (e.g., image captioning), emphasizing unified image semantics extraction under varied quality. Such semantic-aware yet quality-insensitive perception bias inevitably leads to a heavy reliance on image semantics when those LMMs are forced for quality rating. In this paper, instead of retraining or tuning an LMM costly, we propose a training-free debiasing framework, in which the image quality prediction is rectified by mitigating the bias caused by image semantics. Specifically, we first explore several semantic-preserving distortions that can significantly degrade image quality while maintaining identifiable semantics. By applying these specific distortions to the query or test images, we ensure that the degraded images are recognized as poor quality while their semantics mainly remain. During quality inference, both a query image and its corresponding degraded version are fed to the LMM along with a prompt indicating that the query image quality should be inferred under the condition that the degraded one is deemed poor quality. This prior condition effectively aligns the LMM's quality perception, as all degraded images are consistently rated as poor quality, regardless of their semantic variance. Finally, the quality scores of the query image inferred under different prior conditions (degraded versions) are aggregated using a conditional probability model. Extensive experiments on various IQA datasets show that our debiasing framework could consistently enhance the LMM performance.

Paper Structure

This paper contains 13 sections, 11 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of perception bias in Large Multimodal Model (LMM) during quality assessment. Image quality ratings from the LMM (mPLUG-Owl3 ye2024mplugo3) were obtained using the Q-Bench testing framework wu2023qbench. The LMM consistently assigns higher quality ratings to images in the second row compared to the first, despite both sets exhibiting similar quality distributions as measured by Mean Opinion Scores (MOSs). This discrepancy suggests that the LMM relies more on image semantics than on low-level image clues for quality assessment.
  • Figure 2: The framework of our perception bias mitigation scheme. It mainly consists of two components: 1) Bias Exposure: Specific distortions are imposed on the query image to significantly degrade the query image quality while preserving its semantics. The disagreement that the LMM rates those distorted images as poor quality exposes the perception bias inherent in the LMM. 2) Bias Mitigation: Dedicated prompts are defined to mitigate the bias by forcing that the quality of the query image should be assessed under the condition that its degraded counterpart is rated as poor quality. The final quality is then estimated by a semantic similarity based aggregation.
  • Figure 3: Illustration of the four distortion types which could degrade image quality significantly while largely preserving its semantics.
  • Figure 4: Visualization of image quality prediction results. In each subfigure, the top-left label shows numbers in green, blue and red, representing the MOS, the LMM prediction result with the prompt in Q-Bench and our result, respectively.