Table of Contents
Fetching ...

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, Jian Zhang

TL;DR

Q-Insight introduces a reinforcement-learning-based, GRPO-driven framework for comprehensive image quality understanding that jointly optimizes score regression and degradation perception with limited labels. By employing task-specific rewards and a KL-regularized multi-task objective, it achieves strong cross-domain generalization and zero-shot image comparison reasoning, while maintaining interpretable, reasoning-style outputs. The approach outperforms state-of-the-art model-based IQA and SFT-driven LLMs on diverse datasets and demonstrates data-efficient learning, suggesting practical impact for quality assessment, restoration, and enhancement workflows. Limitations include a focus on natural images, with future work extending to AI-generated content and video domains.

Abstract

Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. The rapid advancement of multi-modal large language models (MLLMs) has significantly broadened the scope of IQA, moving toward comprehensive image quality understanding that incorporates content analysis, degradation perception, and comparison reasoning beyond mere numerical scoring. Previous MLLM-based methods typically either generate numerical scores lacking interpretability or heavily rely on supervised fine-tuning (SFT) using large-scale annotated datasets to provide descriptive assessments, limiting their flexibility and applicability. In this paper, we propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels. By jointly optimizing score regression and degradation perception tasks with carefully designed reward functions, our approach effectively exploits their mutual benefits for enhanced performance. Extensive experiments demonstrate that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks, while exhibiting impressive zero-shot generalization to comparison reasoning tasks. Code will be available at https://github.com/lwq20020127/Q-Insight.

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

TL;DR

Q-Insight introduces a reinforcement-learning-based, GRPO-driven framework for comprehensive image quality understanding that jointly optimizes score regression and degradation perception with limited labels. By employing task-specific rewards and a KL-regularized multi-task objective, it achieves strong cross-domain generalization and zero-shot image comparison reasoning, while maintaining interpretable, reasoning-style outputs. The approach outperforms state-of-the-art model-based IQA and SFT-driven LLMs on diverse datasets and demonstrates data-efficient learning, suggesting practical impact for quality assessment, restoration, and enhancement workflows. Limitations include a focus on natural images, with future work extending to AI-generated content and video domains.

Abstract

Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. The rapid advancement of multi-modal large language models (MLLMs) has significantly broadened the scope of IQA, moving toward comprehensive image quality understanding that incorporates content analysis, degradation perception, and comparison reasoning beyond mere numerical scoring. Previous MLLM-based methods typically either generate numerical scores lacking interpretability or heavily rely on supervised fine-tuning (SFT) using large-scale annotated datasets to provide descriptive assessments, limiting their flexibility and applicability. In this paper, we propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels. By jointly optimizing score regression and degradation perception tasks with carefully designed reward functions, our approach effectively exploits their mutual benefits for enhanced performance. Extensive experiments demonstrate that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks, while exhibiting impressive zero-shot generalization to comparison reasoning tasks. Code will be available at https://github.com/lwq20020127/Q-Insight.

Paper Structure

This paper contains 22 sections, 8 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: PLCC comparisons between our proposed Q-Insight and existing IQA metrics (left) and three example applications of our Q-Insight (right) are presented. Q-Insight demonstrates significantly improved performance compared to existing methods such as DeQA-Score you2025teaching, especially on out-of-domain datasets (e.g., CSIQ larson2010most). Additionally, Q-Insight effectively supports quality score regression, image degradation perception, and zero-shot image comparison reasoning tasks.
  • Figure 2: Overview of the proposed Q-Insight framework. The policy model receives queries from multiple tasks and generates corresponding groups of responses accompanied by explicit reasoning steps. Task-specific reward functions ($R_\text{scr}$, $R_\text{deg}$, and $R_\text{lev}$) are then applied, and the policy model is subsequently optimized jointly using the multi-task group relative policy optimization algorithm.
  • Figure 3: Score rating and explanation results of our Q-Insight. Q-Insight is capable of recognizing text, analyzing the lighting and shading conditions of an image, and understanding its composition.
  • Figure 4: Image comparison reasoning results of our Q-Insight and DepictQA you2024DQA. Q-Insight outperforms DepictQA in comprehensive content understanding and accurate degradation perception.
  • Figure A: Subjective ablation comparison between joint multi-task training and w/o joint training on the explanation of image scoring. With joint training, our method can better perceive degradation cues in images (such as pixelated appearance), thereby improving the accuracy of quality assessment.
  • ...and 4 more figures