Table of Contents
Fetching ...

AAPMT: AGI Assessment Through Prompt and Metric Transformer

Benhao Huang

TL;DR

This work tackles the challenge of evaluating AI-generated images in a way that aligns with human perception across perceptual quality, authenticity, and text–image alignment. It builds on a Blip-based backbone and the Image Reward framework, employing prompt‑based prompts and content isolation to study quality assessment and introducing a novel Metric Transformer that can score multiple metrics in one pass. Empirical results show that prompt design significantly affects quality judgments and that the Metric Transformer can match or surpass per‑metric Image Reward performance on AGCIQA2023, with robustness across seeds. The approach offers a scalable, resource-efficient pathway for unified AGI evaluation with practical implications for QA, design, and safety assessments of text‑to‑image systems.

Abstract

The emergence of text-to-image models marks a significant milestone in the evolution of AI-generated images (AGIs), expanding their use in diverse domains like design, entertainment, and more. Despite these breakthroughs, the quality of AGIs often remains suboptimal, highlighting the need for effective evaluation methods. These methods are crucial for assessing the quality of images relative to their textual descriptions, and they must accurately mirror human perception. Substantial progress has been achieved in this domain, with innovative techniques such as BLIP and DBCNN contributing significantly. However, recent studies, including AGIQA-3K, reveal a notable discrepancy between current methods and state-of-the-art (SOTA) standards. This gap emphasizes the necessity for a more sophisticated and precise evaluation metric. In response, our objective is to develop a model that could give ratings for metrics, which focuses on parameters like perceptual quality, authenticity, and the correspondence between text and image, that more closely aligns with human perception. In our paper, we introduce a range of effective methods, including prompt designs and the Metric Transformer. The Metric Transformer is a novel structure inspired by the complex interrelationships among various AGI quality metrics. The code is available at https://github.com/huskydoge/CS3324-Digital-Image-Processing/tree/main/Assignment1

AAPMT: AGI Assessment Through Prompt and Metric Transformer

TL;DR

This work tackles the challenge of evaluating AI-generated images in a way that aligns with human perception across perceptual quality, authenticity, and text–image alignment. It builds on a Blip-based backbone and the Image Reward framework, employing prompt‑based prompts and content isolation to study quality assessment and introducing a novel Metric Transformer that can score multiple metrics in one pass. Empirical results show that prompt design significantly affects quality judgments and that the Metric Transformer can match or surpass per‑metric Image Reward performance on AGCIQA2023, with robustness across seeds. The approach offers a scalable, resource-efficient pathway for unified AGI evaluation with practical implications for QA, design, and safety assessments of text‑to‑image systems.

Abstract

The emergence of text-to-image models marks a significant milestone in the evolution of AI-generated images (AGIs), expanding their use in diverse domains like design, entertainment, and more. Despite these breakthroughs, the quality of AGIs often remains suboptimal, highlighting the need for effective evaluation methods. These methods are crucial for assessing the quality of images relative to their textual descriptions, and they must accurately mirror human perception. Substantial progress has been achieved in this domain, with innovative techniques such as BLIP and DBCNN contributing significantly. However, recent studies, including AGIQA-3K, reveal a notable discrepancy between current methods and state-of-the-art (SOTA) standards. This gap emphasizes the necessity for a more sophisticated and precise evaluation metric. In response, our objective is to develop a model that could give ratings for metrics, which focuses on parameters like perceptual quality, authenticity, and the correspondence between text and image, that more closely aligns with human perception. In our paper, we introduce a range of effective methods, including prompt designs and the Metric Transformer. The Metric Transformer is a novel structure inspired by the complex interrelationships among various AGI quality metrics. The code is available at https://github.com/huskydoge/CS3324-Digital-Image-Processing/tree/main/Assignment1
Paper Structure (16 sections, 4 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 16 sections, 4 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Assess Text-Image Correspondence. '3K' signifies that the task is evaluated on the AGIQA-3K dataset, while '2023' indicates its evaluation on the AIGCIQA2023 dataset. The term 'raw' refers to using the Image Reward model that has not been trained on the specific dataset, in contrast, 'trained' denotes the use of models that have undergone training on these datasets. Additionally, we also present the best scores as reported in references 3K and 2023, corresponding to their respective datasets.
  • Figure 2: Assess Image Quality by designing prompt. Same notations with figure \ref{['TIC']}.
  • Figure 3: Venn diagram of Optimal Parameters Space. The notations here share the meaning with table \ref{['second']}.
  • Figure 4: Diagram of the Metric Transformer. It's worth noting that the primary distinction between our model and Image Reward lies in the design of the final layer. We employ a model similar to the transformer (which typically has only one set of $W_{K,Q,V}$). This design allows the model to evaluate multiple image metrics concurrently, delivering impressive performance as demonstrated in Table \ref{['mtres']}.
  • Figure 5: Training Loss of Metric Transformer in 50 Epochs
  • ...and 1 more figures