Table of Contents
Fetching ...

PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images

Jiquan Yuan, Fanyi Yang, Jihe Li, Xinyan Cao, Jinming Che, Jinlong Lin, Xixin Cao

TL;DR

This work addresses the need for comprehensive perceptual quality assessment of AI-generated images across both text-to-image and image-to-image settings by introducing PKU-AIGIQA-4K, a 4K dataset with human MOS labels across three evaluation dimensions. It proposes three pre-trained-model-based IQA approaches—NR-AIGCIQA (no-reference), FR-AIGCIQA (full-reference using image prompts as references), and PR-AIGCIQA (partial-reference leveraging prompts when available)—including a novel padding-based solution for variable-length prompts. Extensive benchmarking shows ViT backbones deliver strongest performance, while FR and PR methods effectively exploit prompt information to improve assessment accuracy. The dataset and methods enable robust benchmarking and offer practical tools for evaluating and comparing AIGI generation models, with implications for model development and quality control in AI-generated imagery.

Abstract

In recent years, image generation technology has rapidly advanced, resulting in the creation of a vast array of AI-generated images (AIGIs). However, the quality of these AIGIs is highly inconsistent, with low-quality AIGIs severely impairing the visual experience of users. Due to the widespread application of AIGIs, the AI-generated image quality assessment (AIGIQA), aimed at evaluating the quality of AIGIs from the perspective of human perception, has garnered increasing interest among scholars. Nonetheless, current research has not yet fully explored this field. We have observed that existing databases are limited to images generated from single scenario settings. Databases such as AGIQA-1K, AGIQA-3K, and AIGCIQA2023, for example, only include images generated by text-to-image generative models. This oversight highlights a critical gap in the current research landscape, underscoring the need for dedicated databases catering to image-to-image scenarios, as well as more comprehensive databases that encompass a broader range of AI-generated image scenarios. Addressing these issues, we have established a large scale perceptual quality assessment database for both text-to-image and image-to-image AIGIs, named PKU-AIGIQA-4K. We then conduct a well-organized subjective experiment to collect quality labels for AIGIs and perform a comprehensive analysis of the PKU-AIGIQA-4K database. Regarding the use of image prompts during the training process, we propose three image quality assessment (IQA) methods based on pre-trained models that include a no-reference method NR-AIGCIQA, a full-reference method FR-AIGCIQA, and a partial-reference method PR-AIGCIQA. Finally, leveraging the PKU-AIGIQA-4K database, we conduct extensive benchmark experiments and compare the performance of the proposed methods and the current IQA methods.

PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images

TL;DR

This work addresses the need for comprehensive perceptual quality assessment of AI-generated images across both text-to-image and image-to-image settings by introducing PKU-AIGIQA-4K, a 4K dataset with human MOS labels across three evaluation dimensions. It proposes three pre-trained-model-based IQA approaches—NR-AIGCIQA (no-reference), FR-AIGCIQA (full-reference using image prompts as references), and PR-AIGCIQA (partial-reference leveraging prompts when available)—including a novel padding-based solution for variable-length prompts. Extensive benchmarking shows ViT backbones deliver strongest performance, while FR and PR methods effectively exploit prompt information to improve assessment accuracy. The dataset and methods enable robust benchmarking and offer practical tools for evaluating and comparing AIGI generation models, with implications for model development and quality control in AI-generated imagery.

Abstract

In recent years, image generation technology has rapidly advanced, resulting in the creation of a vast array of AI-generated images (AIGIs). However, the quality of these AIGIs is highly inconsistent, with low-quality AIGIs severely impairing the visual experience of users. Due to the widespread application of AIGIs, the AI-generated image quality assessment (AIGIQA), aimed at evaluating the quality of AIGIs from the perspective of human perception, has garnered increasing interest among scholars. Nonetheless, current research has not yet fully explored this field. We have observed that existing databases are limited to images generated from single scenario settings. Databases such as AGIQA-1K, AGIQA-3K, and AIGCIQA2023, for example, only include images generated by text-to-image generative models. This oversight highlights a critical gap in the current research landscape, underscoring the need for dedicated databases catering to image-to-image scenarios, as well as more comprehensive databases that encompass a broader range of AI-generated image scenarios. Addressing these issues, we have established a large scale perceptual quality assessment database for both text-to-image and image-to-image AIGIs, named PKU-AIGIQA-4K. We then conduct a well-organized subjective experiment to collect quality labels for AIGIs and perform a comprehensive analysis of the PKU-AIGIQA-4K database. Regarding the use of image prompts during the training process, we propose three image quality assessment (IQA) methods based on pre-trained models that include a no-reference method NR-AIGCIQA, a full-reference method FR-AIGCIQA, and a partial-reference method PR-AIGCIQA. Finally, leveraging the PKU-AIGIQA-4K database, we conduct extensive benchmark experiments and compare the performance of the proposed methods and the current IQA methods.
Paper Structure (23 sections, 12 equations, 9 figures, 4 tables)

This paper contains 23 sections, 12 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Illustration of the text-to-image and image-to-image generation in the PKU-AIGIQA-4K database.
  • Figure 2: Various scenes and styles of AIGIs sampled from the PKU-AIGIQA-4K database generated by Midjourney, Stable Diffusion V1.5, and DALLE3, including text-to-image and image-to-image AIGIs.
  • Figure 3: An example of the subjective evaluation interface. Evaluators can evaluate the quality of AIGIs by comparing the reference image on the left with the to-be-evaluated AIGIs on the right. They can use the sliders below to record the text-image correspondence score, authenticity score, and quality score.
  • Figure 4: Illustration of AIGIs from three evaluation perspectives. (a) Top AIGI has better quality, authenticity and correspondence. (b) Bottom AIGI has worse quality, authenticity and correspondence
  • Figure 5: Illustration of the MOS score distribution. (a), (b), and (c) exhibits the MOS distribution of quality, authenticity, and correspondence for all AIGIs in the AIGIQA-4K database, respectively. (e), (d), and (f) exhibits the MOS distribution of quality, authenticity, and correspondence for the image-to-image AIGIs in the AIGIQA-4K database, respectively. (g), (h), and (i) exhibits the MOS distribution of quality, authenticity, and correspondence for the text-to-image AIGIs in the AIGIQA-4K database, respectively. (j), (k), and (l) exhibits the MOS distribution of quality, authenticity, and correspondence for three generative models in the AIGIQA-4K database, respectively.
  • ...and 4 more figures