Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Puyi Wang; Wei Sun; Zicheng Zhang; Jun Jia; Yanwei Jiang; Zhichao Zhang; Xiongkuo Min; Guangtao Zhai

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

TL;DR

This work tackles the poor performance of traditional IQA models on AI-generated images by infusing semantic understanding into quality assessment. It introduces MA-AGIQA, which combines a MANIQA-based quality feature extractor with a fixed LMM (mPLUG-Owl2) that provides fine-grained semantic features extracted via carefully designed prompts, fused through a mixture-of-experts module. The approach achieves state-of-the-art results on AGIQA-3k and AIGCQA-20k and demonstrates strong cross-dataset generalization, with ablations confirming the value of both semantic cues and adaptive fusion. The work offers a practical path to more reliable AGI quality evaluation and suggests broad potential for LMM-assisted content quality assessment.

Abstract

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. Code is available at https://github.com/wangpuyi/MA-AGIQA.

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

TL;DR

Abstract

Paper Structure (15 sections, 4 equations, 5 figures, 5 tables)

This paper contains 15 sections, 4 equations, 5 figures, 5 tables.

Introduction
Related Work
Method
Quality-aware Feature Extraction
Fine-grained Semantic Feature Extraction
Adaptive Fusion Module
Experiments
Dataset and Evaluation Metrics
Implementation Details
Comparison with SOTA methods
Ablation Study
Computational Costs
Visualization
Conclusion
Acknowledgement

Figures (5)

Figure 1: Overview of our proposed MA-AGIQA framework. Initially, MANIQA is repurposed as the foundational training backbone, whose structure is modified to generate quality-aware features. Second, a parameter fixed LMM, mPLUG-Owl2, serves as a fine-grained semantic feature extractor. This module utilizes carefully crafted prompts to capture the desired semantic information. Finally, the AFM module acts as an organic feature integrator, dynamically combining these features for enhanced performance.
Figure 2: Four types of image display with strong correlation between image quality and semantics. The ground truth and model predication of the relevant images are presented below each image, showing a significant difference between the model predication and the ground truth, indicating that the model's understanding of semantics is not sufficient.
Figure 3: Presentation of mPLUG-Owl2's answers to two prompts.
Figure 4: Comparative Density Distributions of Absolute Differences for MANQA and MA-AGIQA on AGIQA-3k and AIGQA-20k Datasets
Figure 5: Comparative Analysis of Image Quality Assessment Models: Evaluating MANIQA versus MA-AGIQA Against Ground Truth Scores

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

TL;DR

Abstract

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (5)