Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis
Sergey Sinitsa, Ohad Fried
TL;DR
This work tackles the challenge of detecting synthetic images and inferring model lineage with limited training data. It introduces Deep Image Fingerprint (DIF), a CNN-based approach that treats the network architecture as an image prior to extract generator fingerprints via residual correlation and a threshold-free detection rule, using a small set of generated images and real images. DIF achieves high detection accuracy across LTIMs and GANs, rivaling state-of-the-art pretrained detectors with far fewer training samples, and enables cross-model lineage analysis to link related models (e.g., identifying MJ as a fine-tuned SD1.4). The study also demonstrates the method’s robustness to some distortions and discusses limitations on certain fingerprints and compression scenarios, outlining avenues for future improvement in fingerprint strength and compression resilience.
Abstract
The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, including malicious usage with deceptive intentions. Despite advances in detection techniques for generated images, a robust detection method still eludes us. Furthermore, model personalization techniques might affect the detection capabilities of existing methods. In this work, we utilize the architectural properties of convolutional neural networks (CNNs) to develop a new detection method. Our method can detect images from a known generative model and enable us to establish relationships between fine-tuned generative models. We tested the method on images produced by both Generative Adversarial Networks (GANs) and recent large text-to-image models (LTIMs) that rely on Diffusion Models. Our approach outperforms others trained under identical conditions and achieves comparable performance to state-of-the-art pre-trained detection methods on images generated by Stable Diffusion and MidJourney, with significantly fewer required train samples.
