Table of Contents
Fetching ...

Who Made This? Fake Detection and Source Attribution with Diffusion Features

Simone Bonechi, Paolo Andreini, Barbara Toniella Corradini

TL;DR

FRIDA introduces a training-free framework that leverages internal diffusion-model activations to detect synthetic images and attribute them to their source generators. It uses a $k$-NN on diffusion features for robust cross-generator fake detection and a compact MLP for attribution, both trained on latent representations extracted from a pre-trained Stable Diffusion Model. On GenImage, FRIDA achieves state-of-the-art detection performance with strong generalization to unseen generators and demonstrates generator-specific patterns in diffusion features via SHAP explanations. The approach is data-efficient, fast at inference, and highlights diffusion representations as a universal, interpretable basis for synthetic image forensics with practical deployment benefits. Overall, FRIDA provides a scalable alternative to retraining detectors as generators evolve, bridging diffusion modeling and authenticity analysis.

Abstract

The rapid progress of generative diffusion models has enabled the creation of synthetic images that are increasingly difficult to distinguish from real ones, raising concerns about authenticity, copyright, and misinformation. Existing supervised detectors often struggle to generalize across unseen generators, requiring extensive labeled data and frequent retraining. We introduce FRIDA (Fake-image Recognition and source Identification via Diffusion-features Analysis), a lightweight framework that leverages internal activations from a pre-trained diffusion model for deepfake detection and source generator attribution. A k-nearest-neighbor classifier applied to diffusion features achieves state-of-the-art cross-generator performance without fine-tuning, while a compact neural model enables accurate source attribution. These results show that diffusion representations inherently encode generator-specific patterns, providing a simple and interpretable foundation for synthetic image forensics.

Who Made This? Fake Detection and Source Attribution with Diffusion Features

TL;DR

FRIDA introduces a training-free framework that leverages internal diffusion-model activations to detect synthetic images and attribute them to their source generators. It uses a -NN on diffusion features for robust cross-generator fake detection and a compact MLP for attribution, both trained on latent representations extracted from a pre-trained Stable Diffusion Model. On GenImage, FRIDA achieves state-of-the-art detection performance with strong generalization to unseen generators and demonstrates generator-specific patterns in diffusion features via SHAP explanations. The approach is data-efficient, fast at inference, and highlights diffusion representations as a universal, interpretable basis for synthetic image forensics with practical deployment benefits. Overall, FRIDA provides a scalable alternative to retraining detectors as generators evolve, bridging diffusion modeling and authenticity analysis.

Abstract

The rapid progress of generative diffusion models has enabled the creation of synthetic images that are increasingly difficult to distinguish from real ones, raising concerns about authenticity, copyright, and misinformation. Existing supervised detectors often struggle to generalize across unseen generators, requiring extensive labeled data and frequent retraining. We introduce FRIDA (Fake-image Recognition and source Identification via Diffusion-features Analysis), a lightweight framework that leverages internal activations from a pre-trained diffusion model for deepfake detection and source generator attribution. A k-nearest-neighbor classifier applied to diffusion features achieves state-of-the-art cross-generator performance without fine-tuning, while a compact neural model enables accurate source attribution. These results show that diffusion representations inherently encode generator-specific patterns, providing a simple and interpretable foundation for synthetic image forensics.

Paper Structure

This paper contains 25 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Prototype extraction from Stable Diffusion U-Net. In this example, we extract and average the features from the first decoder layer at $16\times16$ resolution.
  • Figure 2: U-Net Layer Selection by Linear Probing. Average cross-generator validation accuracy obtained using the prototypes extracted from different U-Net layers (labelled as Encoder, Bottleneck or Decoder, followed by the spatial resolution and the intra-stage index). The best accuracy is achieved by features from the first layer of the decoder at $16\times16$ resolution.
  • Figure 3: Confusion Matrix for source image attribution. The MLP-640 is evaluated on the GenImage test set.
  • Figure 4: Shared informative features across generators. We use SHAP with the Gradient Explainer to interpret the decisions of the MLP-640 classifier, employing 200 background and 1000 test samples. The plot reports, for each pair of generators, the percentage of overlap among their top 10 most informative features.
  • Figure 5: Impact of the top-10 features on model decisions by SHAP analysis. Visualization of the ten most influential features identified by the SHAP analysis for each of the eight diffusion generators and the real image class. The figure highlights how different feature subsets contribute to the network’s decision process across generators, revealing both shared and model-specific attribution patterns.