Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment

Yuan Li; Yahan Yu; Youyuan Lin; Yong-Hao Yang; Chenhui Chu; Shin'ya Nishida

Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment

Yuan Li, Yahan Yu, Youyuan Lin, Yong-Hao Yang, Chenhui Chu, Shin'ya Nishida

TL;DR

This work tackles blind image quality assessment (BIQA) by embedding a human-like perception–reasoning cascade into multimodal learning. It introduces the Q-Reasoning dataset to capture eight perception and reasoning dimensions, and trains a large language model with human-guided reinforcement learning, augmented by a self-consistency objective that requires predicting quality from its own captions. The approach achieves competitive image-based quality predictions and significantly improves alignment with human reasoning ( ROUGE-1 ), demonstrating interpretable, human-centered BIQA under both image-conditioned and caption-conditioned settings. The work also proposes caption-based BIQA as a meaningful evaluation dimension, moving BIQA toward interpretable, human-aligned decision making.

Abstract

Humans assess image quality through a perception-reasoning cascade, integrating sensory cues with implicit reasoning to form self-consistent judgments. In this work, we investigate how a model can acquire both human-like and self-consistent reasoning capability for blind image quality assessment (BIQA). We first collect human evaluation data that capture several aspects of human perception-reasoning pipeline. Then, we adopt reinforcement learning, using human annotations as reward signals to guide the model toward human-like perception and reasoning. To enable the model to internalize self-consistent reasoning capability, we design a reward that drives the model to infer the image quality purely from self-generated descriptions. Empirically, our approach achieves score prediction performance comparable to state-of-the-art BIQA systems under general metrics, including Pearson and Spearman correlation coefficients. In addition to the rating score, we assess human-model alignment using ROUGE-1 to measure the similarity between model-generated and human perception-reasoning chains. On over 1,000 human-annotated samples, our model reaches a ROUGE-1 score of 0.512 (cf. 0.443 for baseline), indicating substantial coverage of human explanations and marking a step toward human-like interpretable reasoning in BIQA.

Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment

TL;DR

Abstract

Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)