A Difference-in-Difference Approach to Detecting AI-Generated Images

Xinyi Qi; Kai Ye; Chengchun Shi; Ying Yang; Hongyi Zhou; Jin Zhu

A Difference-in-Difference Approach to Detecting AI-Generated Images

Xinyi Qi, Kai Ye, Chengchun Shi, Ying Yang, Hongyi Zhou, Jin Zhu

TL;DR

This work proposes a novel difference-in-difference method that compute the difference in reconstruction error -- a second-order difference -- for variance reduction and improving detection accuracy of AI-generated images in the era of generative AI.

Abstract

Diffusion models are able to produce AI-generated images that are almost indistinguishable from real ones. This raises concerns about their potential misuse and poses substantial challenges for detecting them. Many existing detectors rely on reconstruction error -- the difference between the input image and its reconstructed version -- as the basis for distinguishing real from fake images. However, these detectors become less effective as modern AI-generated images become increasingly similar to real ones. To address this challenge, we propose a novel difference-in-difference method. Instead of directly using the reconstruction error (a first-order difference), we compute the difference in reconstruction error -- a second-order difference -- for variance reduction and improving detection accuracy. Extensive experiments demonstrate that our method achieves strong generalization performance, enabling reliable detection of AI-generated images in the era of generative AI.

A Difference-in-Difference Approach to Detecting AI-Generated Images

TL;DR

Abstract

Paper Structure (39 sections, 13 equations, 11 figures, 9 tables)

This paper contains 39 sections, 13 equations, 11 figures, 9 tables.

Introduction
Related Works
Diffusion Models
AI-Generated Image Detection
Difference-in-Differences
Methodology
Background on Diffusion Models
Intuition and Limitations in Reconstruction-Based Detection
Difference-In-Differences
Our Proposal
Experiments
Experimental Setup
Comparision against Existing Baselines
Performance with a Larger Training Dataset.
Performance with a Smaller Training Dataset.
...and 24 more sections

Figures (11)

Figure 1: Left: The real image $x$ and its distribution on the manifold $\mathcal{M}$ are shown in blue, whereas the synthetic image $y$ and its distribution are shown in orange. Right: Reconstructions of the real and synthetic images. $\Delta_{\textrm{fake}}$ and $\Delta_{\textrm{real}}$ denote the reconstruction errors of synthetic and real images, respectively, and $\Pi_{\mathcal{M}}$ represents the projection of the real image onto $\mathcal{M}$. Top: The generator is weak, resulting in a large discrepancy between the real and synthetic image distributions. Consequently, the reconstruction error of the synthetic image, $\Delta_{\textrm{fake}}(y)$, is much larger than that of the real image, $\Delta_{\textrm{real}}(x)$, making detection easier. Bottom: The generator is strong, so the real and synthetic image distributions are close. Here, $\Delta_{\textrm{fake}}(y)$ is similar to $\Delta_{\textrm{real}}(x)$, making AI-generated images difficult to detect.
Figure 2: Visualizations of our DID detector. $\Delta$ denotes the differencing operator that computes the pixel-wise difference between two images.
Figure 3: Visualizations of (a) input image $x$; (b) its first-order reconstruction error $\Delta(x)$; (c) the reconstruction error of the reconstructed image $x'$; and (d) the second-order reconstruction error $\Delta^2(x)$. Compared to the first-order error $\Delta(x)$, the second-order error $\Delta^{2}(x)$ better distinguishes real from fake images: real images produce more stable and brighter responses, while fake images exhibit noticeably weaker signals, resulting in a larger gap between the two categories.
Figure 4: Comparison of first--order and second--order residuals across real and generated images. The DIRE framework Wang2023DIRE shows that real images typically produce sharper and more structured first-order residuals $\Delta x$, whereas fake images tend to yield weaker patterns. However, in broader evaluation across diverse and strong generative models, first-order residuals are not always stable when the real and synthetic distributions are close. In more challenging cases, fake images can even produce stronger residual signals than real images, leading to potential misclassification. In contrast, our proposed second-order residual module $\Delta^2 x$ provides more consistent separability between real and fake samples by taking the difference between the reconstruction error of the input image and that of its reconstructed version.
Figure 5: Visualization of real images across the DID pipeline. We show the original image $x$, the first reconstruction $x'$, the second reconstruction $x"$, and their corresponding residual maps $\Delta x$, $\Delta x'$, and $\Delta^2 x$.
...and 6 more figures

A Difference-in-Difference Approach to Detecting AI-Generated Images

TL;DR

Abstract

A Difference-in-Difference Approach to Detecting AI-Generated Images

Authors

TL;DR

Abstract

Table of Contents

Figures (11)