Table of Contents
Fetching ...

Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis

Sergey Sinitsa, Ohad Fried

TL;DR

This work tackles the challenge of detecting synthetic images and inferring model lineage with limited training data. It introduces Deep Image Fingerprint (DIF), a CNN-based approach that treats the network architecture as an image prior to extract generator fingerprints via residual correlation and a threshold-free detection rule, using a small set of generated images and real images. DIF achieves high detection accuracy across LTIMs and GANs, rivaling state-of-the-art pretrained detectors with far fewer training samples, and enables cross-model lineage analysis to link related models (e.g., identifying MJ as a fine-tuned SD1.4). The study also demonstrates the method’s robustness to some distortions and discusses limitations on certain fingerprints and compression scenarios, outlining avenues for future improvement in fingerprint strength and compression resilience.

Abstract

The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, including malicious usage with deceptive intentions. Despite advances in detection techniques for generated images, a robust detection method still eludes us. Furthermore, model personalization techniques might affect the detection capabilities of existing methods. In this work, we utilize the architectural properties of convolutional neural networks (CNNs) to develop a new detection method. Our method can detect images from a known generative model and enable us to establish relationships between fine-tuned generative models. We tested the method on images produced by both Generative Adversarial Networks (GANs) and recent large text-to-image models (LTIMs) that rely on Diffusion Models. Our approach outperforms others trained under identical conditions and achieves comparable performance to state-of-the-art pre-trained detection methods on images generated by Stable Diffusion and MidJourney, with significantly fewer required train samples.

Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis

TL;DR

This work tackles the challenge of detecting synthetic images and inferring model lineage with limited training data. It introduces Deep Image Fingerprint (DIF), a CNN-based approach that treats the network architecture as an image prior to extract generator fingerprints via residual correlation and a threshold-free detection rule, using a small set of generated images and real images. DIF achieves high detection accuracy across LTIMs and GANs, rivaling state-of-the-art pretrained detectors with far fewer training samples, and enables cross-model lineage analysis to link related models (e.g., identifying MJ as a fine-tuned SD1.4). The study also demonstrates the method’s robustness to some distortions and discusses limitations on certain fingerprints and compression scenarios, outlining avenues for future improvement in fingerprint strength and compression resilience.

Abstract

The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, including malicious usage with deceptive intentions. Despite advances in detection techniques for generated images, a robust detection method still eludes us. Furthermore, model personalization techniques might affect the detection capabilities of existing methods. In this work, we utilize the architectural properties of convolutional neural networks (CNNs) to develop a new detection method. Our method can detect images from a known generative model and enable us to establish relationships between fine-tuned generative models. We tested the method on images produced by both Generative Adversarial Networks (GANs) and recent large text-to-image models (LTIMs) that rely on Diffusion Models. Our approach outperforms others trained under identical conditions and achieves comparable performance to state-of-the-art pre-trained detection methods on images generated by Stable Diffusion and MidJourney, with significantly fewer required train samples.
Paper Structure (28 sections, 8 equations, 15 figures, 7 tables)

This paper contains 28 sections, 8 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: The reconstructed gray images (top) and their FFT log-magnitude (bottom). For previewing images are normalized. The mean value is subtracted before applying the FFT. U-Net produces mix of boundary artifacts (lines) and up-sampling artifacts (checkerboard). C-Net produces solely boundary artifacts, while Up-Net exclusively yields up-sampling artifacts. In the Up-Net model, the input noise is up-sampled after 1x1 convolution, resulting in blocks with varying gray levels.
  • Figure 2: Fingerprints of SD 1.4 and GLIDE. Per each model two types of fingerprint are shown: fingerprint by residual averaging ($\mathcal{F}_A$) and extracted by DIF ($\mathcal{F}_E$). Observe the $\mathcal{F}_A$ of each model: SD 1.4 demonstrates strong grid-like pattern, whereas GLIDE shows none - GLIDE has a "weak" fingerprint. In contrast to $\mathcal{F}_A$, $\mathcal{F}_E$ reveals clear patterns of SD 1.4 and GLIDE.
  • Figure 3: Cross-detection accuracy in percents. Each grid characterized by model and epoch. Observe clusters for epochs 40,52,70.
  • Figure 4: Model lineage analysis of SD models with different detection methods. For DIF and Grag21 we show cross-detection (%) and for Marra18 cross-correlation of fingerprints. Notably, only DIF exhibits clusters of SD 1.x and SD 2.x with high and symmetric cross-detection.
  • Figure 5: Model lineage analysis of LTIMs with DIF by cross-detection (%). DM and D2 denote DALL$\cdot$E-Mini and DALL$\cdot$E-2, respectively. Relation between SD 1.4 and MJ is similar to relation between SD 1.x models.
  • ...and 10 more figures