Table of Contents
Fetching ...

Local Statistics for Generative Image Detection

Yung Jer Wong, Teck Khim Ng

TL;DR

The paper addresses the problem of distinguishing real digital camera images from diffusion-model generated images. It introduces three localized feature sets that exploit Bayer pattern traces and spatial non-stationarity, avoiding deep learning in favor of interpretable, low-cost features. The approach demonstrates strong robustness to image resizing and JPEG compression and generalizes well to unseen diffusion models, outperforming the DIRE detector in cross-dataset tests. This yields a practical forensic tool for reliable media authenticity assessment with limited training data.

Abstract

Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvements in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesized images. In this paper, we highlighted the effectiveness of Bayer pattern and local statistics in distinguishing digital camera images from DM-generated images. We further hypothesized that local statistics should be used to address the spatial non-stationarity problems in images. We showed that our approach produced promising results for distinguishing real images from synthesized images. This approach is also robust to various perturbations such as image resizing and JPEG compression.

Local Statistics for Generative Image Detection

TL;DR

The paper addresses the problem of distinguishing real digital camera images from diffusion-model generated images. It introduces three localized feature sets that exploit Bayer pattern traces and spatial non-stationarity, avoiding deep learning in favor of interpretable, low-cost features. The approach demonstrates strong robustness to image resizing and JPEG compression and generalizes well to unseen diffusion models, outperforming the DIRE detector in cross-dataset tests. This yields a practical forensic tool for reliable media authenticity assessment with limited training data.

Abstract

Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvements in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesized images. In this paper, we highlighted the effectiveness of Bayer pattern and local statistics in distinguishing digital camera images from DM-generated images. We further hypothesized that local statistics should be used to address the spatial non-stationarity problems in images. We showed that our approach produced promising results for distinguishing real images from synthesized images. This approach is also robust to various perturbations such as image resizing and JPEG compression.
Paper Structure (6 sections, 5 equations, 3 figures, 2 tables)

This paper contains 6 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: RGGB Bayer pattern in color filters. The green pixels are sampled twice as many as the red or blue pixels. Other possible Bayer patterns include BGGR, GBRG and GRBG.
  • Figure 2: Flow diagram of the pipeline used to extract Features (2) and Features (3). $I'$: Block-reduced Image. Each 10x10 block is reduced to one pixel representing the sum of diagonal and anti-diagonal variances. $(x,y)$: Current coordinates using a 0-based index. $Corr_{A, B}$: Mean of local Pearson Correlation Coefficients between image A and B. $M, N$: Number of pixels from the horizontal and vertical directions in $I'$ respectively.
  • Figure 3: Visualization of Features (1) and Features (3). Features (2) serve as a complement to Features (1) and Features (3). Top: A randomly sampled camera image (plot 1), the frequency analyses on diagonal gradients (plot 2) and antidiagonal gradients (plot 3) in the green channel, and a visualization of Features (3) for the camera image (plot 4). Bottom: A randomly sampled DM-generated image (plot 1), the frequency analyses on diagonal gradients (plot 2) and antidiagonal gradients (plot 3) in the green channel, and a visualization of Features (3) for the DM-generated image (plot 4).