Table of Contents
Fetching ...

A Bias-Free Training Paradigm for More General AI-generated Image Detection

Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, Luisa Verdoliva

TL;DR

The paper addresses the generalization gap in AI-generated image detection caused by dataset biases. It introduces B-Free, a bias-free training paradigm that generates semantically aligned fake images using self-conditioned reconstructions and content augmentations via Stable Diffusion 2.1, and trains a ViT-based detector end-to-end on large, non-resized crops. By assembling a bias-controlled dataset (51k real, 309k fake) and evaluating across 27 generators with metrics including AUC and calibration (ECE, NLL), the authors demonstrate improved generalization to unseen generators and better calibration. The key finding is that careful dataset design and content-aligned augmentation can outperform more complex algorithms, highlighting the importance of reducing biases to achieve robust forensic detection in real-world settings.

Abstract

Successful forensic detectors can produce excellent results in supervised learning benchmarks but struggle to transfer to real-world applications. We believe this limitation is largely due to inadequate training data quality. While most research focuses on developing new algorithms, less attention is given to training data selection, despite evidence that performance can be strongly impacted by spurious correlations such as content, format, or resolution. A well-designed forensic detector should detect generator specific artifacts rather than reflect data biases. To this end, we propose B-Free, a bias-free training paradigm, where fake images are generated from real ones using the conditioning procedure of stable diffusion models. This ensures semantic alignment between real and fake images, allowing any differences to stem solely from the subtle artifacts introduced by AI generation. Through content-based augmentation, we show significant improvements in both generalization and robustness over state-of-the-art detectors and more calibrated results across 27 different generative models, including recent releases, like FLUX and Stable Diffusion 3.5. Our findings emphasize the importance of a careful dataset design, highlighting the need for further research on this topic. Code and data are publicly available at https://grip-unina.github.io/B-Free/.

A Bias-Free Training Paradigm for More General AI-generated Image Detection

TL;DR

The paper addresses the generalization gap in AI-generated image detection caused by dataset biases. It introduces B-Free, a bias-free training paradigm that generates semantically aligned fake images using self-conditioned reconstructions and content augmentations via Stable Diffusion 2.1, and trains a ViT-based detector end-to-end on large, non-resized crops. By assembling a bias-controlled dataset (51k real, 309k fake) and evaluating across 27 generators with metrics including AUC and calibration (ECE, NLL), the authors demonstrate improved generalization to unseen generators and better calibration. The key finding is that careful dataset design and content-aligned augmentation can outperform more complex algorithms, highlighting the importance of reducing biases to achieve robust forensic detection in real-world settings.

Abstract

Successful forensic detectors can produce excellent results in supervised learning benchmarks but struggle to transfer to real-world applications. We believe this limitation is largely due to inadequate training data quality. While most research focuses on developing new algorithms, less attention is given to training data selection, despite evidence that performance can be strongly impacted by spurious correlations such as content, format, or resolution. A well-designed forensic detector should detect generator specific artifacts rather than reflect data biases. To this end, we propose B-Free, a bias-free training paradigm, where fake images are generated from real ones using the conditioning procedure of stable diffusion models. This ensures semantic alignment between real and fake images, allowing any differences to stem solely from the subtle artifacts introduced by AI generation. Through content-based augmentation, we show significant improvements in both generalization and robustness over state-of-the-art detectors and more calibrated results across 27 different generative models, including recent releases, like FLUX and Stable Diffusion 3.5. Our findings emphasize the importance of a careful dataset design, highlighting the need for further research on this topic. Code and data are publicly available at https://grip-unina.github.io/B-Free/.

Paper Structure

This paper contains 19 sections, 2 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: We introduce a new training paradigm for AI-generated image detection. To avoid possible biases, we generate synthetic images from self-conditioned reconstructions of real images and include augmentation in the form of inpainted versions. This allows to avoid semantic biases. As a consequence, we obtain better generalization to unseen models and better calibration than SoTA methods.
  • Figure 2: Forensic detectors can exhibit opposite behaviors depending on their training dataset. The four plots show the prediction distributions for three ViT-based detectors, UnivFD Ojha2023towards, FatFormer Liu2024forgery and RINE Koutlis2024leveraging, and the proposed one. The fake images (SD-XL or DALL-E 3) are generated from images of a single dataset (RAISE on top, COCO on the bottom) and tested only against real images of the same dataset (Synthbuster Bammey2023synthbuster and the test dataset from Cozzolino2024raising). We observe that for the same detector (e.g., RINE) and the same fake-image generator (e.g., DALL-E 3) the score distributions can vary significantly depending on the dataset used, going from real (left of the dotted line) to fake (right of the dotted line) or vice versa. This is likely due to the presence of biases in the training set that heavily impact the detector prediction. Our detector, on the other hand, shows consistent and correct results.
  • Figure 3: Overview of existing ($a$, $b$, $c$) and proposed ($d$) strategies for building an aligned training dataset. Some methods try to match synthetic images to the corresponding real images by using class-based generation ($a$) or text-to-image generation with real images' descriptions ($b$). In ($c$) real images are fed to an autoencoder to generate a reconstructed fake with the same content. Differently from ($c$), in our approach a self-conditioned fake is generated using diffusion steps ($d$), and we also add a content augmentation step.
  • Figure 4: Content augmentation process. Starting with a real image, we use its generated variants (first row) and their locally manipulated versions (last row), created by replacing the original background. When inpainting with a different category, we use a bounding box instead of an object mask to allow space for new objects of varying shapes and sizes.
  • Figure 5: Power spectra computed by averaging (2000 images) the differences between: (a) real and reconstructed images, (b) real and self-conditioned images, and (c) reconstructed and self-conditioned images. We can observe that the self-conditioned generation embeds forensic artifacts even at lower frequencies compared to reconstructed images. This means that it is possible to better exploit such inconsistencies to distinguish real from fakes.
  • ...and 6 more figures