Table of Contents
Fetching ...

AI-Generated Image Quality Assessment Based on Task-Specific Prompt and Multi-Granularity Similarity

Jili Xia, Lihuo He, Fei Gao, Kaifan Zhang, Leida Li, Xinbo Gao

TL;DR

A novel quality assessment method is proposed named TSP-MGS, which designs task-specific prompts and measures multi-granularity similarity between AIGIs and the prompts and demonstrates the superiority of the proposed TSP-MGS on commonly used AGIQA-1K and AGIQA-3K benchmarks.

Abstract

Recently, AI-generated images (AIGIs) created by given prompts (initial prompts) have garnered widespread attention. Nevertheless, due to technical nonproficiency, they often suffer from poor perception quality and Text-to-Image misalignment. Therefore, assessing the perception quality and alignment quality of AIGIs is crucial to improving the generative model's performance. Existing assessment methods overly rely on the initial prompts in the task prompt design and use the same prompts to guide both perceptual and alignment quality evaluation, overlooking the distinctions between the two tasks. To address this limitation, we propose a novel quality assessment method for AIGIs named TSP-MGS, which designs task-specific prompts and measures multi-granularity similarity between AIGIs and the prompts. Specifically, task-specific prompts are first constructed to describe perception and alignment quality degrees separately, and the initial prompt is introduced for detailed quality perception. Then, the coarse-grained similarity between AIGIs and task-specific prompts is calculated, which facilitates holistic quality awareness. In addition, to improve the understanding of AIGI details, the fine-grained similarity between the image and the initial prompt is measured. Finally, precise quality prediction is acquired by integrating the multi-granularity similarities. Experiments on the commonly used AGIQA-1K and AGIQA-3K benchmarks demonstrate the superiority of the proposed TSP-MGS.

AI-Generated Image Quality Assessment Based on Task-Specific Prompt and Multi-Granularity Similarity

TL;DR

A novel quality assessment method is proposed named TSP-MGS, which designs task-specific prompts and measures multi-granularity similarity between AIGIs and the prompts and demonstrates the superiority of the proposed TSP-MGS on commonly used AGIQA-1K and AGIQA-3K benchmarks.

Abstract

Recently, AI-generated images (AIGIs) created by given prompts (initial prompts) have garnered widespread attention. Nevertheless, due to technical nonproficiency, they often suffer from poor perception quality and Text-to-Image misalignment. Therefore, assessing the perception quality and alignment quality of AIGIs is crucial to improving the generative model's performance. Existing assessment methods overly rely on the initial prompts in the task prompt design and use the same prompts to guide both perceptual and alignment quality evaluation, overlooking the distinctions between the two tasks. To address this limitation, we propose a novel quality assessment method for AIGIs named TSP-MGS, which designs task-specific prompts and measures multi-granularity similarity between AIGIs and the prompts. Specifically, task-specific prompts are first constructed to describe perception and alignment quality degrees separately, and the initial prompt is introduced for detailed quality perception. Then, the coarse-grained similarity between AIGIs and task-specific prompts is calculated, which facilitates holistic quality awareness. In addition, to improve the understanding of AIGI details, the fine-grained similarity between the image and the initial prompt is measured. Finally, precise quality prediction is acquired by integrating the multi-granularity similarities. Experiments on the commonly used AGIQA-1K and AGIQA-3K benchmarks demonstrate the superiority of the proposed TSP-MGS.

Paper Structure

This paper contains 26 sections, 12 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of quality degradations of AIGIs. (a) AIGIs do not align with the initial prompts. (b) AIGIs suffer from poor perception quality. (c) AIGIs present high alignment quality and perception quality.
  • Figure 2: The pipeline of the proposed method. First, we construct task-specific prompts to describe the degree of perception and alignment quality, while introducing the initial prompt for image content understanding. Using the CLIP model, we then extract image features from both the full image and cropped patches, along with text features from the task-specific prompts and the initial prompt. For a comprehensive quality perception and content understanding, we calculate multi-granularity similarities, i.e., coarse-grained similarity and fine-grained similarity, between the image features and text features. Finally, we integrate these similarities for a precise quality prediction.
  • Figure 3: Coarse-grained similarity measurement.
  • Figure 4: Fine-grained similarity measurement.
  • Figure 5: The impact of the initial prompt on the (a) perception quality evaluation and (b) alignment quality evaluation, where $w/o$ means without and $w/$ represents with.
  • ...and 3 more figures