Table of Contents
Fetching ...

Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

Zhenchen Tang, Songlin Yang, Bo Peng, Zichuan Wang, Jing Dong

TL;DR

This work analyzes MLLM-based image quality assessment (IQA) methods and identifies two fundamental problems: conversion errors from discretizing continuous MOS values and semantic confusion caused by using generic level tokens. It proves that token-based formulations incur a nonzero expected error $\mathbb{E}[\epsilon(x)^2] > 0$, whereas regression-based scoring can theoretically reduce this error to zero via universal approximation, motivating a shift toward continuous predictions. The authors propose Q-Scorer, which integrates IQA-specific score tokens with a lightweight MLP regressor to produce continuous MOS, achieving state-of-the-art results on multiple IQA benchmarks and strong transferability, even under mixed-dataset training. The method is LoRA-friendly and can be combined with existing techniques (e.g., KL loss, fidelity loss, norm-in-norm, hyper networks, ranking losses) to further boost performance, representing a practical pathway to embed high-fidelity IQA into MLLM-based systems.

Abstract

The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks. This discrepancy significantly hinders the performance of MLLM-based IQA methods. Previous approaches that convert discrete token predictions into continuous scores often suffer from conversion errors. Moreover, the semantic confusion introduced by level tokens (e.g., ``good'') further constrains the performance of MLLMs on IQA tasks and degrades their original capabilities for related tasks. To tackle these problems, we provide a theoretical analysis of the errors inherent in previous approaches and, motivated by this analysis, propose a simple yet effective framework, Q-Scorer. This framework incorporates a lightweight regression module and IQA-specific score tokens into the MLLM pipeline. Extensive experiments demonstrate that Q-Scorer achieves state-of-the-art performance across multiple IQA benchmarks, generalizes well to mixed datasets, and further improves when combined with other methods.

Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

TL;DR

This work analyzes MLLM-based image quality assessment (IQA) methods and identifies two fundamental problems: conversion errors from discretizing continuous MOS values and semantic confusion caused by using generic level tokens. It proves that token-based formulations incur a nonzero expected error , whereas regression-based scoring can theoretically reduce this error to zero via universal approximation, motivating a shift toward continuous predictions. The authors propose Q-Scorer, which integrates IQA-specific score tokens with a lightweight MLP regressor to produce continuous MOS, achieving state-of-the-art results on multiple IQA benchmarks and strong transferability, even under mixed-dataset training. The method is LoRA-friendly and can be combined with existing techniques (e.g., KL loss, fidelity loss, norm-in-norm, hyper networks, ranking losses) to further boost performance, representing a practical pathway to embed high-fidelity IQA into MLLM-based systems.

Abstract

The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks. This discrepancy significantly hinders the performance of MLLM-based IQA methods. Previous approaches that convert discrete token predictions into continuous scores often suffer from conversion errors. Moreover, the semantic confusion introduced by level tokens (e.g., ``good'') further constrains the performance of MLLMs on IQA tasks and degrades their original capabilities for related tasks. To tackle these problems, we provide a theoretical analysis of the errors inherent in previous approaches and, motivated by this analysis, propose a simple yet effective framework, Q-Scorer. This framework incorporates a lightweight regression module and IQA-specific score tokens into the MLLM pipeline. Extensive experiments demonstrate that Q-Scorer achieves state-of-the-art performance across multiple IQA benchmarks, generalizes well to mixed datasets, and further improves when combined with other methods.

Paper Structure

This paper contains 44 sections, 33 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of MLLM-based IQA methods and error analysis. The figure shows how MLLMs are adapted for label conversion and score prediction, and highlights steps causing conversion errors and semantic confusion.
  • Figure 2: Visual illustration and detailed explanation of conversion errors. The figure provides examples detailing two main sources of conversion errors: label approximation (from discretizing MOS) and restoration error (from imperfect score restoration). See Sec. \ref{['semantic confusion']} and Fig.\ref{['fig3']} for details.
  • Figure 3: Examples of post-tuning semantic confusion, showing how different token strategies affect T2I alignment assessment.
  • Figure 4: Overview of Q-Scorer. It uses $\mathcal{L}_{\text{ce}}$ learn to output an interval‑specific score token. The token's embedding is then passed to an MLP to regress the continuous quality score, optimized with $\mathcal{L}_{\text{score}}$ to preserve the lossless MOS.
  • Figure 5: Qualitative results across three types of IQA datasets. Predicted scores are shown in red, and ground-truth MOSs are shown in blue.