Detecting Generated Images by Fitting Natural Image Distributions
Yonggang Zhang, Jun Nie, Xinmei Tian, Mingming Gong, Kun Zhang, Bo Han
TL;DR
The paper addresses the challenge of detecting AI-generated images without relying on large, model-specific generated-image datasets. It introduces Consistency Verification (ConV), a training-free framework that uses two functions with outputs identical on the natural image manifold but gradients in orthogonal subspaces to detect deviations caused by generation; detection relies on a self-supervised model's loss under perturbations along the manifold. To boost robustness when natural-manifold gaps shrink, it adds Flow-Based Manifold Extrusion (F-ConV) by training a normalizing flow (RealNVP) atop self-supervised features to explicitly extrude generated images away from the natural manifold, combining shaping and consistency losses. Across ImageNet, LSUN, GenImage, DRCT-2M, and unseen generators such as Sora/OpenSora, the approach yields competitive or superior performance to training-based detectors while reducing data requirements, highlighting strong cross-model generalization and practical applicability.
Abstract
The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of natural and generated images. To exploit this difference, we employ a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a simple yet effective detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. Further more, to address diminishing manifold disparities in advanced generative models, we leverage normalizing flows to amplify detectable differences by extruding generated images away from the natural image manifold. Extensive experiments demonstrate the efficacy of this method. Code is available at https://github.com/tmlr-group/ConV.
