Table of Contents
Fetching ...

Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images

Memoona Aziz, Umair Rehman, Muhammad Umair Danish, Katarina Grolinger

TL;DR

GLIPS introduces a photorealistic image quality metric that couples local patch-level similarity derived from Vision Transformer attention with global distributional similarity via Maximum Mean Discrepancy, balanced by a tunable parameter. To ensure interpretable comparisons with human judgments, it also presents the Interpolative Binning Scale (IBS), which maps metric outputs into Likert-like bins with linear interpolation for precision. Empirical evaluation against human judgments across several models shows GLIPS achieves superior correlation and lower error (MAPE) than traditional metrics such as FID, SSIM, MS-SSIM, LPIPS, and KID, with ablation confirming the benefit of combining local and global components. The work provides a practical tool for developers and researchers to assess and guide improvements in photorealistic image generation, with potential applications in media, medical imaging, and AI governance.

Abstract

This paper introduces the Global-Local Image Perceptual Score (GLIPS), an image metric designed to assess the photorealistic image quality of AI-generated images with a high degree of alignment to human visual perception. Traditional metrics such as FID and KID scores do not align closely with human evaluations. The proposed metric incorporates advanced transformer-based attention mechanisms to assess local similarity and Maximum Mean Discrepancy (MMD) to evaluate global distributional similarity. To evaluate the performance of GLIPS, we conducted a human study on photorealistic image quality. Comprehensive tests across various generative models demonstrate that GLIPS consistently outperforms existing metrics like FID, SSIM, and MS-SSIM in terms of correlation with human scores. Additionally, we introduce the Interpolative Binning Scale (IBS), a refined scaling method that enhances the interpretability of metric scores by aligning them more closely with human evaluative standards. The proposed metric and scaling approach not only provides more reliable assessments of AI-generated images but also suggest pathways for future enhancements in image generation technologies.

Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images

TL;DR

GLIPS introduces a photorealistic image quality metric that couples local patch-level similarity derived from Vision Transformer attention with global distributional similarity via Maximum Mean Discrepancy, balanced by a tunable parameter. To ensure interpretable comparisons with human judgments, it also presents the Interpolative Binning Scale (IBS), which maps metric outputs into Likert-like bins with linear interpolation for precision. Empirical evaluation against human judgments across several models shows GLIPS achieves superior correlation and lower error (MAPE) than traditional metrics such as FID, SSIM, MS-SSIM, LPIPS, and KID, with ablation confirming the benefit of combining local and global components. The work provides a practical tool for developers and researchers to assess and guide improvements in photorealistic image generation, with potential applications in media, medical imaging, and AI governance.

Abstract

This paper introduces the Global-Local Image Perceptual Score (GLIPS), an image metric designed to assess the photorealistic image quality of AI-generated images with a high degree of alignment to human visual perception. Traditional metrics such as FID and KID scores do not align closely with human evaluations. The proposed metric incorporates advanced transformer-based attention mechanisms to assess local similarity and Maximum Mean Discrepancy (MMD) to evaluate global distributional similarity. To evaluate the performance of GLIPS, we conducted a human study on photorealistic image quality. Comprehensive tests across various generative models demonstrate that GLIPS consistently outperforms existing metrics like FID, SSIM, and MS-SSIM in terms of correlation with human scores. Additionally, we introduce the Interpolative Binning Scale (IBS), a refined scaling method that enhances the interpretability of metric scores by aligning them more closely with human evaluative standards. The proposed metric and scaling approach not only provides more reliable assessments of AI-generated images but also suggest pathways for future enhancements in image generation technologies.
Paper Structure (25 sections, 20 equations, 3 figures, 3 tables)

This paper contains 25 sections, 20 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Original and Images Generated by Models such as DALLE-2, DALLE-3, and Stable Diffusion. The caption given to the model while generating images is "A woman touching her skis going down a ski hill." The original image belongs to the MS-COCO dataset with ID: 000000080671.
  • Figure 2: Comparative Analysis of Photorealistic Image Quality Across Different AI Models
  • Figure 3: Comparison of GLIPS with other approaches for each model.