Table of Contents
Fetching ...

Detecting Generated Images by Fitting Natural Image Distributions

Yonggang Zhang, Jun Nie, Xinmei Tian, Mingming Gong, Kun Zhang, Bo Han

TL;DR

The paper addresses the challenge of detecting AI-generated images without relying on large, model-specific generated-image datasets. It introduces Consistency Verification (ConV), a training-free framework that uses two functions with outputs identical on the natural image manifold but gradients in orthogonal subspaces to detect deviations caused by generation; detection relies on a self-supervised model's loss under perturbations along the manifold. To boost robustness when natural-manifold gaps shrink, it adds Flow-Based Manifold Extrusion (F-ConV) by training a normalizing flow (RealNVP) atop self-supervised features to explicitly extrude generated images away from the natural manifold, combining shaping and consistency losses. Across ImageNet, LSUN, GenImage, DRCT-2M, and unseen generators such as Sora/OpenSora, the approach yields competitive or superior performance to training-based detectors while reducing data requirements, highlighting strong cross-model generalization and practical applicability.

Abstract

The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of natural and generated images. To exploit this difference, we employ a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a simple yet effective detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. Further more, to address diminishing manifold disparities in advanced generative models, we leverage normalizing flows to amplify detectable differences by extruding generated images away from the natural image manifold. Extensive experiments demonstrate the efficacy of this method. Code is available at https://github.com/tmlr-group/ConV.

Detecting Generated Images by Fitting Natural Image Distributions

TL;DR

The paper addresses the challenge of detecting AI-generated images without relying on large, model-specific generated-image datasets. It introduces Consistency Verification (ConV), a training-free framework that uses two functions with outputs identical on the natural image manifold but gradients in orthogonal subspaces to detect deviations caused by generation; detection relies on a self-supervised model's loss under perturbations along the manifold. To boost robustness when natural-manifold gaps shrink, it adds Flow-Based Manifold Extrusion (F-ConV) by training a normalizing flow (RealNVP) atop self-supervised features to explicitly extrude generated images away from the natural manifold, combining shaping and consistency losses. Across ImageNet, LSUN, GenImage, DRCT-2M, and unseen generators such as Sora/OpenSora, the approach yields competitive or superior performance to training-based detectors while reducing data requirements, highlighting strong cross-model generalization and practical applicability.

Abstract

The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of natural and generated images. To exploit this difference, we employ a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a simple yet effective detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. Further more, to address diminishing manifold disparities in advanced generative models, we leverage normalizing flows to amplify detectable differences by extruding generated images away from the natural image manifold. Extensive experiments demonstrate the efficacy of this method. Code is available at https://github.com/tmlr-group/ConV.

Paper Structure

This paper contains 29 sections, 18 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Comparison of (a): the existing framework, and (b): our proposed ConV. The binary classifier in (a) is trained using natural images $\mathbf{x}_{\mathcal{M}}$ and generated images $\mathbf{x}_g$, thereby, its efficacy relies on both the natural and generated data distributions. In contrast, the two functions in (b) are trained on natural data distribution, leading to the advantage of ConV: identifying generated images by fitting the distribution of natural images rather than that of generated images.
  • Figure 2: Generated images deviate from natural images' manifold, but the deviation decreases as generative model evolves. Red dots denote the feature representations of natural images, while purple dots represent those of generated images.
  • Figure 3: Illustration of projecting a generated image $\mathbf{x}_g$ onto the data manifold $\mathcal{M}$.
  • Figure 4: Framework of consistency verification.
  • Figure 5: ConV with multiple forward passes.
  • ...and 4 more figures