Table of Contents
Fetching ...

Towards Explainable Partial-AIGC Image Quality Assessment

Jiaying Qian, Ziheng Jia, Zicheng Zhang, Zeyu Zhang, Guangtao Zhai, Xiongkuo Min

TL;DR

This work tackles the gap in perceptual image quality assessment for partial-AIGC edits (PAIs) by introducing the EPAIQA-15K dataset and a three-stage large multimodal model pipeline that grounds editing regions, predicts quantitative quality, and provides explainable feedback. The EPAIQA series models leverage chain-of-thought reasoning and cross-model validation to deliver interpretable quality explanations in addition to numerical scores, demonstrating strong improvements over existing IQA baselines on harmony, local naturalness, and overall quality. The dataset combines source/edited images, editing prompts, and precise region coordinates across 12 editing tools, with extensive human ratings across four dimensions, enabling robust training and evaluation of explainable PAIQA. While the results show clear advances, the authors acknowledge remaining gaps relative to NSI-IQA and T2I-AGIQA, pointing to future work in dataset expansion and task-specific model refinements to further close the gap between PAIs and fully AI-generated content quality assessment.

Abstract

The rapid advancement of AI-driven visual generation technologies has catalyzed significant breakthroughs in image manipulation, particularly in achieving photorealistic localized editing effects on natural scene images (NSIs). Despite extensive research on image quality assessment (IQA) for AI-generated images (AGIs), most studies focus on fully AI-generated outputs (e.g., text-to-image generation), leaving the quality assessment of partial-AIGC images (PAIs)-images with localized AI-driven edits an almost unprecedented field. Motivated by this gap, we construct the first large-scale PAI dataset towards explainable partial-AIGC image quality assessment (EPAIQA), the EPAIQA-15K, which includes 15K images with localized AI manipulation in different regions and over 300K multi-dimensional human ratings. Based on this, we leverage large multi-modal models (LMMs) and propose a three-stage model training paradigm. This paradigm progressively trains the LMM for editing region grounding, quantitative quality scoring, and quality explanation. Finally, we develop the EPAIQA series models, which possess explainable quality feedback capabilities. Our work represents a pioneering effort in the perceptual IQA field for comprehensive PAI quality assessment.

Towards Explainable Partial-AIGC Image Quality Assessment

TL;DR

This work tackles the gap in perceptual image quality assessment for partial-AIGC edits (PAIs) by introducing the EPAIQA-15K dataset and a three-stage large multimodal model pipeline that grounds editing regions, predicts quantitative quality, and provides explainable feedback. The EPAIQA series models leverage chain-of-thought reasoning and cross-model validation to deliver interpretable quality explanations in addition to numerical scores, demonstrating strong improvements over existing IQA baselines on harmony, local naturalness, and overall quality. The dataset combines source/edited images, editing prompts, and precise region coordinates across 12 editing tools, with extensive human ratings across four dimensions, enabling robust training and evaluation of explainable PAIQA. While the results show clear advances, the authors acknowledge remaining gaps relative to NSI-IQA and T2I-AGIQA, pointing to future work in dataset expansion and task-specific model refinements to further close the gap between PAIs and fully AI-generated content quality assessment.

Abstract

The rapid advancement of AI-driven visual generation technologies has catalyzed significant breakthroughs in image manipulation, particularly in achieving photorealistic localized editing effects on natural scene images (NSIs). Despite extensive research on image quality assessment (IQA) for AI-generated images (AGIs), most studies focus on fully AI-generated outputs (e.g., text-to-image generation), leaving the quality assessment of partial-AIGC images (PAIs)-images with localized AI-driven edits an almost unprecedented field. Motivated by this gap, we construct the first large-scale PAI dataset towards explainable partial-AIGC image quality assessment (EPAIQA), the EPAIQA-15K, which includes 15K images with localized AI manipulation in different regions and over 300K multi-dimensional human ratings. Based on this, we leverage large multi-modal models (LMMs) and propose a three-stage model training paradigm. This paradigm progressively trains the LMM for editing region grounding, quantitative quality scoring, and quality explanation. Finally, we develop the EPAIQA series models, which possess explainable quality feedback capabilities. Our work represents a pioneering effort in the perceptual IQA field for comprehensive PAI quality assessment.

Paper Structure

This paper contains 35 sections, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Data construction pipeline of the EPAIQA-15K dataset.
  • Figure 2: Examples of edited images with divergent harmony and local naturalness quality level.
  • Figure 3: Human ratings distribution across four evaluation dimensions.
  • Figure 4: More information on data distribution
  • Figure 5: $3$D scatter plot of data across three dimensions
  • ...and 10 more figures