Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection
Hina Otake, Yoshihiro Fukuhara, Yoshiki Kubotani, Shigeo Morishima
TL;DR
The paper examines whether general-purpose representations learned solely from synthetic data can support synthetic image detection (SID) across unseen generative models. It evaluates ViT backbones pre-trained with synthetic-data methods StableRep and SynCLR within a UnivFD-style SID framework, comparing them to real-data backbones like CLIP. Key findings show SynCLR yields a substantial improvement over CLIP on unseen GANs (+10.32 mAP, +4.73% accuracy), while DM-based fakes remain challenging; ensemble fusion of synthetic- and real-data backbones provides further gains (+7.53 mAP). These results indicate that synthetic-data-driven representations can complement real-data foundations to bolster SID robustness and generalization, with practical implications for detector design and data-efficient learning.
Abstract
Are general-purpose visual representations acquired solely from synthetic data useful for detecting fake images? In this work, we show the effectiveness of synthetic data-driven representations for synthetic image detection. Upon analysis, we find that vision transformers trained by the latest visual representation learners with synthetic data can effectively distinguish fake from real images without seeing any real images during pre-training. Notably, using SynCLR as the backbone in a state-of-the-art detection method demonstrates a performance improvement of +10.32 mAP and +4.73% accuracy over the widely used CLIP, when tested on previously unseen GAN models. Code is available at https://github.com/cvpaperchallenge/detect-fake-with-fake.
