Table of Contents
Fetching ...

AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment

Hanwei Zhu, Yu Tian, Keyan Ding, Baoliang Chen, Bolin Chen, Shiqi Wang, Weisi Lin

TL;DR

AgenticIQA introduces a modular, agent-based framework for image quality assessment that jointly optimizes scoring accuracy and interpretability. By decomposing IQA into distortion detection, distortion analysis, tool selection, and tool execution under a plan–execute–summarize loop, it achieves adaptive, query-aware evaluations that combine traditional IQA tools with vision-language model reasoning. A large, structured AgenticIQA-200K dataset and the AgenticIQA-Eval benchmark support training and evaluation of planning, execution, and summarization capabilities. Experimental results across diverse IQA benchmarks demonstrate superior scoring precision and explanation quality compared with both score-based and VLM-only baselines, highlighting the practical impact of agentic reasoning for robust perceptual quality assessment.

Abstract

Image quality assessment (IQA) is inherently complex, as it reflects both the quantification and interpretation of perceptual quality rooted in the human visual system. Conventional approaches typically rely on fixed models to output scalar scores, limiting their adaptability to diverse distortions, user-specific queries, and interpretability needs. Furthermore, scoring and interpretation are often treated as independent processes, despite their interdependence: interpretation identifies perceptual degradations, while scoring abstracts them into a compact metric. To address these limitations, we propose AgenticIQA, a modular agentic framework that integrates vision-language models (VLMs) with traditional IQA tools in a dynamic, query-aware manner. AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution -- coordinated by a planner, executor, and summarizer. The planner formulates task-specific strategies, the executor collects perceptual evidence via tool invocation, and the summarizer integrates this evidence to produce accurate scores with human-aligned explanations. To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents. Extensive experiments across diverse IQA datasets demonstrate that AgenticIQA consistently surpasses strong baselines in both scoring accuracy and explanatory alignment.

AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment

TL;DR

AgenticIQA introduces a modular, agent-based framework for image quality assessment that jointly optimizes scoring accuracy and interpretability. By decomposing IQA into distortion detection, distortion analysis, tool selection, and tool execution under a plan–execute–summarize loop, it achieves adaptive, query-aware evaluations that combine traditional IQA tools with vision-language model reasoning. A large, structured AgenticIQA-200K dataset and the AgenticIQA-Eval benchmark support training and evaluation of planning, execution, and summarization capabilities. Experimental results across diverse IQA benchmarks demonstrate superior scoring precision and explanation quality compared with both score-based and VLM-only baselines, highlighting the practical impact of agentic reasoning for robust perceptual quality assessment.

Abstract

Image quality assessment (IQA) is inherently complex, as it reflects both the quantification and interpretation of perceptual quality rooted in the human visual system. Conventional approaches typically rely on fixed models to output scalar scores, limiting their adaptability to diverse distortions, user-specific queries, and interpretability needs. Furthermore, scoring and interpretation are often treated as independent processes, despite their interdependence: interpretation identifies perceptual degradations, while scoring abstracts them into a compact metric. To address these limitations, we propose AgenticIQA, a modular agentic framework that integrates vision-language models (VLMs) with traditional IQA tools in a dynamic, query-aware manner. AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution -- coordinated by a planner, executor, and summarizer. The planner formulates task-specific strategies, the executor collects perceptual evidence via tool invocation, and the summarizer integrates this evidence to produce accurate scores with human-aligned explanations. To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents. Extensive experiments across diverse IQA datasets demonstrate that AgenticIQA consistently surpasses strong baselines in both scoring accuracy and explanatory alignment.

Paper Structure

This paper contains 44 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustrations of the motivation behind our work. (a) Traditional IQA frameworks often rely on a single tool, either a score-based model with accurate but non-explainable outputs or a VLM-based model with interpretable but coarse ratings. Moreover, their static workflows limit adaptability to diverse IQA tasks. (b) Our AgenticIQA introduces a dynamic agent system that plans and executes IQA sub-tasks based on the user query and image content. It adaptively integrates multi-source quality cues generated during task execution and produces informative, query-aware answers through a refinement process.
  • Figure 2: Overview of the AgenticIQA framework illustrating the workflow across planner, executor, and summarizer modules.
  • Figure 3: Illustration of average running time per sample on different datasets.
  • Figure 4: Illustration of comparing the tool-augment score prediction scheme with the uniform averaging.
  • Figure 5: Illustrative examples from the AgenticIQA-Eval benchmark. Each subfigure corresponds to one evaluation component: (Left to Right) planner reasoning, distortion severity assessment, tool appropriateness, and summarization over multimodal evidence.
  • ...and 4 more figures