Table of Contents
Fetching ...

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, Kede Ma

TL;DR

The paper tackles NR-IQA generalization across distortions by reframing quality as a relative concept and leveraging reasoning-enabled reinforcement learning. It introduces VisualQuality-R1, trained with RL2R and GRPO to generate multiple scores per image and use Thurstone-based comparisons with continuous fidelity rewards. The method achieves state-of-the-art performance among NR-IQA models and across datasets, and it can produce human-aligned quality descriptions. Multi-dataset training without perceptual scale realignment and strong generalization make it suitable for downstream tasks in super-resolution and image generation.

Abstract

DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computation has not been thoroughly explored in the context of image quality assessment (IQA), a task depending critically on visual reasoning. In this paper, we introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality. Specifically, for a pair of images, we employ group relative policy optimization to generate multiple quality scores for each image. These estimates are used to compute comparative probabilities of one image having higher quality than the other under the Thurstone model. Rewards for each quality estimate are defined using continuous fidelity measures rather than discretized binary labels. Extensive experiments show that the proposed VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make VisualQuality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation.

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

TL;DR

The paper tackles NR-IQA generalization across distortions by reframing quality as a relative concept and leveraging reasoning-enabled reinforcement learning. It introduces VisualQuality-R1, trained with RL2R and GRPO to generate multiple scores per image and use Thurstone-based comparisons with continuous fidelity rewards. The method achieves state-of-the-art performance among NR-IQA models and across datasets, and it can produce human-aligned quality descriptions. Multi-dataset training without perceptual scale realignment and strong generalization make it suitable for downstream tasks in super-resolution and image generation.

Abstract

DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computation has not been thoroughly explored in the context of image quality assessment (IQA), a task depending critically on visual reasoning. In this paper, we introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality. Specifically, for a pair of images, we employ group relative policy optimization to generate multiple quality scores for each image. These estimates are used to compute comparative probabilities of one image having higher quality than the other under the Thurstone model. Rewards for each quality estimate are defined using continuous fidelity measures rather than discretized binary labels. Extensive experiments show that the proposed VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make VisualQuality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation.

Paper Structure

This paper contains 25 sections, 7 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: VisualQuality-R1 excels at image quality scoring, while generating contextually rich, human-aligned quality descriptions.
  • Figure 2: System diagram of the proposed VisualQuality-R1 trained via RL2R. Given an image pair $(x_i, x_j)$ with a shared text prompt $c$, VisualQuality-R1 generates $K$ responses. Following GRPO shao2024deepseekmath, each response includes a detailed reasoning process and a predicted quality score. To assess relative visual quality, we calculate the asymmetric comparative probability that image $x_i$ is perceived better than $x_j$ under the Thurstone model thurstone1927law. This involves subtracting the mean predicted score of $x_j$ from the $k$-th score of $x_i$, standardized by their sample variances. A fidelity reward is derived from human preference, providing continuous supervisory signals for policy optimization.
  • Figure 3: Prediction variability decreases during GRPO. We randomly select $20$ images from each of CLIVE ghadiyaram2015massive, KonIQ-10k hosu2020koniq, SRIQA-Bench chen2025toward, and AGIQA-3K li2023agiqa. At successive training steps, we generate multiple responses per image, compute the std of the predicted quality scores, and plot the average std across images. The uniformly downward trend confirms that VisualQuality-R1 becomes steadily more stable in assessing image quality as training progresses.
  • Figure 4: Evolution of the reasoning capabilities of VisualQuality-R1 on an image super-resolved by SwinIR liang2021swinir. Initially, VisualQuality-R1 overlooks artifacts and overestimates quality; at later stages, it progressively detects stylization, blur, and filtering effects, yielding more accurate quality scores and human-aligned textual justifications. Zoom in for improved visibility.
  • Figure 5: gMAD competition results between VisualQuality-R1 and Q-Insight li2025qinsight. (a) Fixed Q-Insight at the low-quality level. (b) Fixed Q-Insight at the high-quality level. (c) Fixed VisualQuality-R1 at the low-quality level. (d) Fixed VisualQuality-R1 at the high-quality level.
  • ...and 2 more figures