Table of Contents
Fetching ...

Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection

Hina Otake, Yoshihiro Fukuhara, Yoshiki Kubotani, Shigeo Morishima

TL;DR

The paper examines whether general-purpose representations learned solely from synthetic data can support synthetic image detection (SID) across unseen generative models. It evaluates ViT backbones pre-trained with synthetic-data methods StableRep and SynCLR within a UnivFD-style SID framework, comparing them to real-data backbones like CLIP. Key findings show SynCLR yields a substantial improvement over CLIP on unseen GANs (+10.32 mAP, +4.73% accuracy), while DM-based fakes remain challenging; ensemble fusion of synthetic- and real-data backbones provides further gains (+7.53 mAP). These results indicate that synthetic-data-driven representations can complement real-data foundations to bolster SID robustness and generalization, with practical implications for detector design and data-efficient learning.

Abstract

Are general-purpose visual representations acquired solely from synthetic data useful for detecting fake images? In this work, we show the effectiveness of synthetic data-driven representations for synthetic image detection. Upon analysis, we find that vision transformers trained by the latest visual representation learners with synthetic data can effectively distinguish fake from real images without seeing any real images during pre-training. Notably, using SynCLR as the backbone in a state-of-the-art detection method demonstrates a performance improvement of +10.32 mAP and +4.73% accuracy over the widely used CLIP, when tested on previously unseen GAN models. Code is available at https://github.com/cvpaperchallenge/detect-fake-with-fake.

Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection

TL;DR

The paper examines whether general-purpose representations learned solely from synthetic data can support synthetic image detection (SID) across unseen generative models. It evaluates ViT backbones pre-trained with synthetic-data methods StableRep and SynCLR within a UnivFD-style SID framework, comparing them to real-data backbones like CLIP. Key findings show SynCLR yields a substantial improvement over CLIP on unseen GANs (+10.32 mAP, +4.73% accuracy), while DM-based fakes remain challenging; ensemble fusion of synthetic- and real-data backbones provides further gains (+7.53 mAP). These results indicate that synthetic-data-driven representations can complement real-data foundations to bolster SID robustness and generalization, with practical implications for detector design and data-efficient learning.

Abstract

Are general-purpose visual representations acquired solely from synthetic data useful for detecting fake images? In this work, we show the effectiveness of synthetic data-driven representations for synthetic image detection. Upon analysis, we find that vision transformers trained by the latest visual representation learners with synthetic data can effectively distinguish fake from real images without seeing any real images during pre-training. Notably, using SynCLR as the backbone in a state-of-the-art detection method demonstrates a performance improvement of +10.32 mAP and +4.73% accuracy over the widely used CLIP, when tested on previously unseen GAN models. Code is available at https://github.com/cvpaperchallenge/detect-fake-with-fake.
Paper Structure (14 sections, 1 equation, 3 figures, 5 tables)

This paper contains 14 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: UMAP visualization of real images (blue) and fake images generated by ProGAN (yellow) in the backbone embedding space. SynCLR’s embedding space best separates the real features from fake.
  • Figure 2: UMAP visualization of real images (blue), fake images generated by GANs (green), and fake images generated by DMs (yellow) using different backbone embedding spaces. The GAN data points include images generated by ProGAN, CycleGAN, BigGAN, StarGAN, and StyleGAN2. The DM data points include images generated by Guided, LDM, and Glide.
  • Figure 3: Attention maps visualizing the areas of focus for each model during SID. The maps show the first, intermediate, and last layers for real images and images generated by GANs (ProGAN, CycleGAN, BigGAN, StyleGAN2) and DMs (Guided, LDM, Glide), averaged across all heads.