Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability
Jaehui Hwang, Junghyuk Lee, Jong-Seok Lee
TL;DR
The paper addresses the challenge of evaluating image-generative quality beyond feature-distance metrics that poorly align with human perception. It introduces two metrics, the anomaly score ($AS$) and the anomaly score for individual images ($AS_i$), grounded in two representation-space properties: complexity and vulnerability; AS uses a 2D Kolmogorov–Smirnov distance between joint distributions of ($C,V$) for real and generated data. Empirical results show that AS and AS_i correlate more strongly with human judgments than prior metrics like FID, across multiple datasets and feature models, and that AS_i can capture per-image naturalness with high fidelity. The work provides a practical framework for robust, human-aligned evaluation of both entire generative-model outputs and individual generated images, with implications for model development and benchmarking in synthetic image generation.
Abstract
With the advancement of generative models, the assessment of generated images becomes more and more important. Previous methods measure distances between features of reference and generated images from trained vision models. In this paper, we conduct an extensive investigation into the relationship between the representation space and input space around generated images. We first propose two measures related to the presence of unnatural elements within images: complexity, which indicates how non-linear the representation space is, and vulnerability, which is related to how easily the extracted feature changes by adversarial input changes. Based on these, we introduce a new metric to evaluating image-generative models called anomaly score (AS). Moreover, we propose AS-i (anomaly score for individual images) that can effectively evaluate generated images individually. Experimental results demonstrate the validity of the proposed approach.
