Table of Contents
Fetching ...

Towards Universal Fake Image Detectors that Generalize Across Generative Models

Utkarsh Ojha, Yuheng Li, Yong Jae Lee

TL;DR

This work addresses the challenge of detecting fake images from unseen generative models by showing that traditional end-to-end CNN detectors overfit to known GAN fingerprints, treating real images as a broad sink. It proposes a simple, principled shift: perform real-vs-fake classification in a fixed, general-purpose feature space derived from a large pretrained model (CLIP:ViT-L/14) and use nearest neighbor or linear probing, keeping the backbone frozen. The results demonstrate substantial generalization gains to unseen diffusion and autoregressive models, surpassing state-of-the-art detectors and revealing the advantage of not learning task-specific features. The findings suggest that leveraging broad, richly trained feature spaces provides a robust, scalable baseline for universal fake image detection across diverse generative paradigms, with strong practical implications for media integrity and security.

Abstract

With generative models proliferating at a rapid rate, there is a growing need for general purpose fake image detectors. In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images. Upon analysis, we find that the resulting classifier is asymmetrically tuned to detect patterns that make an image fake. The real class becomes a sink class holding anything that is not fake, including generated images from models not accessible during training. Building upon this discovery, we propose to perform real-vs-fake classification without learning; i.e., using a feature space not explicitly trained to distinguish real from fake images. We use nearest neighbor and linear probing as instantiations of this idea. When given access to the feature space of a large pretrained vision-language model, the very simple baseline of nearest neighbor classification has surprisingly good generalization ability in detecting fake images from a wide variety of generative models; e.g., it improves upon the SoTA by +15.07 mAP and +25.90% acc when tested on unseen diffusion and autoregressive models.

Towards Universal Fake Image Detectors that Generalize Across Generative Models

TL;DR

This work addresses the challenge of detecting fake images from unseen generative models by showing that traditional end-to-end CNN detectors overfit to known GAN fingerprints, treating real images as a broad sink. It proposes a simple, principled shift: perform real-vs-fake classification in a fixed, general-purpose feature space derived from a large pretrained model (CLIP:ViT-L/14) and use nearest neighbor or linear probing, keeping the backbone frozen. The results demonstrate substantial generalization gains to unseen diffusion and autoregressive models, surpassing state-of-the-art detectors and revealing the advantage of not learning task-specific features. The findings suggest that leveraging broad, richly trained feature spaces provides a robust, scalable baseline for universal fake image detection across diverse generative paradigms, with strong practical implications for media integrity and security.

Abstract

With generative models proliferating at a rapid rate, there is a growing need for general purpose fake image detectors. In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images. Upon analysis, we find that the resulting classifier is asymmetrically tuned to detect patterns that make an image fake. The real class becomes a sink class holding anything that is not fake, including generated images from models not accessible during training. Building upon this discovery, we propose to perform real-vs-fake classification without learning; i.e., using a feature space not explicitly trained to distinguish real from fake images. We use nearest neighbor and linear probing as instantiations of this idea. When given access to the feature space of a large pretrained vision-language model, the very simple baseline of nearest neighbor classification has surprisingly good generalization ability in detecting fake images from a wide variety of generative models; e.g., it improves upon the SoTA by +15.07 mAP and +25.90% acc when tested on unseen diffusion and autoregressive models.
Paper Structure (32 sections, 2 equations, 15 figures, 6 tables)

This paper contains 32 sections, 2 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Using images from just one generative model, can we detect images from a different type of generative model as fake?
  • Figure 2: t-SNE visualization of real and fake images associated with two types of generative models. The feature space used is of a classifier trained to distinguish Fake (GAN) from Real (GAN).
  • Figure 3: Average frequency spectra of each domain. The first four correspond to fake images from GANs and diffusion models. The last one represents real images from LAION laion dataset.
  • Figure 4: Nearest neighbors for real-vs-fake classification. We first map the real and fake images to their corresponding feature representations using a pre-trained CLIP:ViT network not trained for this task. A test image is mapped into the same feature space, and cosine distance is used to find the closest member in the feature bank. The label of that member is the predicted class.
  • Figure 5: Ablation on the network architecture and pre-training dataset. A network trained on the task of CLIP is better equipped at separating fake images from real, compared to networks trained on ImageNet classification. The red dotted line depicts chance performance.
  • ...and 10 more figures