Table of Contents
Fetching ...

When Synthetic Traces Hide Real Content: Analysis of Stable Diffusion Image Laundering

Sara Mandelli, Paolo Bestagini, Stefano Tubaro

TL;DR

The paper investigates how Stable Diffusion image laundering transforms real content into highly realistic laundered copies that confound forensic analysis. It introduces a two-stage detection pipeline that first separates real vs synthetic content and then discriminates fully synthetic from laundered images, achieving near-perfect accuracy and robustness to common post-processing on diverse datasets and SD releases. The study also uses Fourier residual analysis to reveal distinctive patterns in laundered content and shows that laundering erodes camera model fingerprints, undermining attribution. Overall, the work highlights a practical forensic risk and provides a concrete detection framework, with implications for web content authenticity and policy, while pointing to directions for strengthening forensic resilience against laundering attacks.

Abstract

In recent years, methods for producing highly realistic synthetic images have significantly advanced, allowing the creation of high-quality images from text prompts that describe the desired content. Even more impressively, Stable Diffusion (SD) models now provide users with the option of creating synthetic images in an image-to-image translation fashion, modifying images in the latent space of advanced autoencoders. This striking evolution, however, brings an alarming consequence: it is possible to pass an image through SD autoencoders to reproduce a synthetic copy of the image with high realism and almost no visual artifacts. This process, known as SD image laundering, can transform real images into lookalike synthetic ones and risks complicating forensic analysis for content authenticity verification. Our paper investigates the forensic implications of image laundering, revealing a serious potential to obscure traces of real content, including sensitive and harmful materials that could be mistakenly classified as synthetic, thereby undermining the protection of individuals depicted. To address this issue, we propose a two-stage detection pipeline that effectively differentiates between pristine, laundered, and fully synthetic images (those generated from text prompts), showing robustness across various conditions. Finally, we highlight another alarming property of image laundering, which appears to mask the unique artifacts exploited by forensic detectors to solve the camera model identification task, strongly undermining their performance. Our experimental code is available at https://github.com/polimi-ispl/synthetic-image-detection.

When Synthetic Traces Hide Real Content: Analysis of Stable Diffusion Image Laundering

TL;DR

The paper investigates how Stable Diffusion image laundering transforms real content into highly realistic laundered copies that confound forensic analysis. It introduces a two-stage detection pipeline that first separates real vs synthetic content and then discriminates fully synthetic from laundered images, achieving near-perfect accuracy and robustness to common post-processing on diverse datasets and SD releases. The study also uses Fourier residual analysis to reveal distinctive patterns in laundered content and shows that laundering erodes camera model fingerprints, undermining attribution. Overall, the work highlights a practical forensic risk and provides a concrete detection framework, with implications for web content authenticity and policy, while pointing to directions for strengthening forensic resilience against laundering attacks.

Abstract

In recent years, methods for producing highly realistic synthetic images have significantly advanced, allowing the creation of high-quality images from text prompts that describe the desired content. Even more impressively, Stable Diffusion (SD) models now provide users with the option of creating synthetic images in an image-to-image translation fashion, modifying images in the latent space of advanced autoencoders. This striking evolution, however, brings an alarming consequence: it is possible to pass an image through SD autoencoders to reproduce a synthetic copy of the image with high realism and almost no visual artifacts. This process, known as SD image laundering, can transform real images into lookalike synthetic ones and risks complicating forensic analysis for content authenticity verification. Our paper investigates the forensic implications of image laundering, revealing a serious potential to obscure traces of real content, including sensitive and harmful materials that could be mistakenly classified as synthetic, thereby undermining the protection of individuals depicted. To address this issue, we propose a two-stage detection pipeline that effectively differentiates between pristine, laundered, and fully synthetic images (those generated from text prompts), showing robustness across various conditions. Finally, we highlight another alarming property of image laundering, which appears to mask the unique artifacts exploited by forensic detectors to solve the camera model identification task, strongly undermining their performance. Our experimental code is available at https://github.com/polimi-ispl/synthetic-image-detection.
Paper Structure (13 sections, 6 figures, 3 tables)

This paper contains 13 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Pristine images and their laundered copies obtained by passing pristine samples through SD autoencoder. Laundered samples look extremely realistic, with almost a total absence of notable generation artifacts, even in the case of uncommon patterns that could be harder to reproduce.
  • Figure 2: Scheme of the proposed backbone detector based on the random extraction of $N$ squared patches from the input image.
  • Figure 3: Distributions of the detection scores of real and laundered images; the detector is a synthetic versus real detector not trained over laundered images.
  • Figure 4: Fourier transform analysis of real, laundered and fully synthetic images. All spectra are centered in the spatial frequencies $(0, 0)$.
  • Figure 5: Sketch of our proposed two-stage methodology to classify an analyzed image as real, fully synthetic or laundered. At step 1, we tell real images apart from synthetic ones; step 2 further discriminates between fully synthetic and laundered samples.
  • ...and 1 more figures