Table of Contents
Fetching ...

AI-generated Image Quality Assessment in Visual Communication

Yu Tian, Yixuan Li, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Sam Kwong

TL;DR

The paper identifies a gap in assessing AI-generated images for visual communication, proposing AIGI-VC, a dataset of 2,500 images across 14 ad topics and 8 emotions with coarse and fine-grained human pref-erences and GPT-4o-assisted explanations. It benchmarks a broad set of IQA metrics and large multimodal models on communicability, revealing that state-of-the-art methods struggle to capture information clarity and emotional interaction in practical advertising contexts. The work demonstrates GPT-4o's strong performance in prediction, interpretation, and reasoning, while highlighting limitations of open LMMs and CLIP-based metrics for AIGI quality assessment. By providing both dataset resources and a comprehensive evaluation framework, the study paves the way for developing IQA metrics better aligned with real-world visual communication goals.

Abstract

Assessing the quality of artificial intelligence-generated images (AIGIs) plays a crucial role in their application in real-world scenarios. However, traditional image quality assessment (IQA) algorithms primarily focus on low-level visual perception, while existing IQA works on AIGIs overemphasize the generated content itself, neglecting its effectiveness in real-world applications. To bridge this gap, we propose AIGI-VC, a quality assessment database for AI-Generated Images in Visual Communication, which studies the communicability of AIGIs in the advertising field from the perspectives of information clarity and emotional interaction. The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types. It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning. We conduct an empirical study of existing representative IQA methods and large multi-modal models on the AIGI-VC dataset, uncovering their strengths and weaknesses.

AI-generated Image Quality Assessment in Visual Communication

TL;DR

The paper identifies a gap in assessing AI-generated images for visual communication, proposing AIGI-VC, a dataset of 2,500 images across 14 ad topics and 8 emotions with coarse and fine-grained human pref-erences and GPT-4o-assisted explanations. It benchmarks a broad set of IQA metrics and large multimodal models on communicability, revealing that state-of-the-art methods struggle to capture information clarity and emotional interaction in practical advertising contexts. The work demonstrates GPT-4o's strong performance in prediction, interpretation, and reasoning, while highlighting limitations of open LMMs and CLIP-based metrics for AIGI quality assessment. By providing both dataset resources and a comprehensive evaluation framework, the study paves the way for developing IQA metrics better aligned with real-world visual communication goals.

Abstract

Assessing the quality of artificial intelligence-generated images (AIGIs) plays a crucial role in their application in real-world scenarios. However, traditional image quality assessment (IQA) algorithms primarily focus on low-level visual perception, while existing IQA works on AIGIs overemphasize the generated content itself, neglecting its effectiveness in real-world applications. To bridge this gap, we propose AIGI-VC, a quality assessment database for AI-Generated Images in Visual Communication, which studies the communicability of AIGIs in the advertising field from the perspectives of information clarity and emotional interaction. The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types. It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning. We conduct an empirical study of existing representative IQA methods and large multi-modal models on the AIGI-VC dataset, uncovering their strengths and weaknesses.

Paper Structure

This paper contains 17 sections, 4 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Outline of the AIGI-VC dataset.
  • Figure 2: Sample images from the AIGI-VC database, where the first to fifth columns show images generated by Dall$\cdot$E 3, Stable Diffusion XL, Stable Diffusion 3.0, Stable Diffusion 2.0, and Dreamlike Photoreal 2.0.
  • Figure 3: Accuracy of preference choices via MAP estimation in $M$ rounds.
  • Figure 4: Distribution of preference probabilities for image pairs in the AIGI-VC dataset.
  • Figure 5: The process of description generation. Given two images with preference choices collected from human users, GPT produces the initial descriptions according to visual cues influencing human preference judgments. Human experts then verify and supplement GPT-generated descriptions to produce golden descriptions.