Table of Contents
Fetching ...

LaRE$^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

Yunpeng Luo, Junlong Du, Ke Yan, Shouhong Ding

TL;DR

This work tackles the rising challenge of distinguishing diffusion-generated images from real ones. It introduces LaRE^2, which combines Latent Reconstruction Error (LaRE) computed via a single-step latent-space denoising with an Error-Guided Feature Refinement (EGRE) that aligns and refines image features through spatial and channel attention. On the GenImage benchmark with eight generators, LaRE^2 achieves state-of-the-art accuracy and average precision, with gains up to 11.9% ACC and 12.1% AP, while delivering about an 8× speedup over prior reconstruction-based detectors. The approach enhances practical robustness by improving generalization to unseen generators and reducing computational cost, making diffusion-generated image detection more scalable for real-world use. Code is available.

Abstract

The evolution of Diffusion Models has dramatically improved image generation quality, making it increasingly difficult to differentiate between real and generated images. This development, while impressive, also raises significant privacy and security concerns. In response to this, we propose a novel Latent REconstruction error guided feature REfinement method (LaRE^2) for detecting the diffusion-generated images. We come up with the Latent Reconstruction Error (LaRE), the first reconstruction-error based feature in the latent space for generated image detection. LaRE surpasses existing methods in terms of feature extraction efficiency while preserving crucial cues required to differentiate between the real and the fake. To exploit LaRE, we propose an Error-Guided feature REfinement module (EGRE), which can refine the image feature guided by LaRE to enhance the discriminativeness of the feature. Our EGRE utilizes an align-then-refine mechanism, which effectively refines the image feature for generated-image detection from both spatial and channel perspectives. Extensive experiments on the large-scale GenImage benchmark demonstrate the superiority of our LaRE^2, which surpasses the best SoTA method by up to 11.9%/12.1% average ACC/AP across 8 different image generators. LaRE also surpasses existing methods in terms of feature extraction cost, delivering an impressive speed enhancement of 8 times. Code is available.

LaRE$^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

TL;DR

This work tackles the rising challenge of distinguishing diffusion-generated images from real ones. It introduces LaRE^2, which combines Latent Reconstruction Error (LaRE) computed via a single-step latent-space denoising with an Error-Guided Feature Refinement (EGRE) that aligns and refines image features through spatial and channel attention. On the GenImage benchmark with eight generators, LaRE^2 achieves state-of-the-art accuracy and average precision, with gains up to 11.9% ACC and 12.1% AP, while delivering about an 8× speedup over prior reconstruction-based detectors. The approach enhances practical robustness by improving generalization to unseen generators and reducing computational cost, making diffusion-generated image detection more scalable for real-world use. Code is available.

Abstract

The evolution of Diffusion Models has dramatically improved image generation quality, making it increasingly difficult to differentiate between real and generated images. This development, while impressive, also raises significant privacy and security concerns. In response to this, we propose a novel Latent REconstruction error guided feature REfinement method (LaRE^2) for detecting the diffusion-generated images. We come up with the Latent Reconstruction Error (LaRE), the first reconstruction-error based feature in the latent space for generated image detection. LaRE surpasses existing methods in terms of feature extraction efficiency while preserving crucial cues required to differentiate between the real and the fake. To exploit LaRE, we propose an Error-Guided feature REfinement module (EGRE), which can refine the image feature guided by LaRE to enhance the discriminativeness of the feature. Our EGRE utilizes an align-then-refine mechanism, which effectively refines the image feature for generated-image detection from both spatial and channel perspectives. Extensive experiments on the large-scale GenImage benchmark demonstrate the superiority of our LaRE^2, which surpasses the best SoTA method by up to 11.9%/12.1% average ACC/AP across 8 different image generators. LaRE also surpasses existing methods in terms of feature extraction cost, delivering an impressive speed enhancement of 8 times. Code is available.
Paper Structure (23 sections, 12 equations, 7 figures, 3 tables)

This paper contains 23 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a) The comparison of reconstruction-based feature extraction. Existing methodwang2023dire chooses to completely reconstruct an image by first gradually adding noise to the image and then denoising it, which involves dozens of sampling steps. Our method can directly calculate noisy images and denoise them with a single sample step. (b) Statistical analysis of the relationship between the single-step reconstruction loss (1000 images are used) and time step. The obvious gap between the two lines indicates that single-step reconstruction can also reflect the differences between real and generated images. (c) Comparison of the cost of per image feature extraction. Our method is 8x faster than DIREwang2023dire.
  • Figure 2: Visualization of reconstruction loss on raw images (i.e. Image + Loss). Though the randomly sampled noises are added to the whole image, there is a trend that the loss in high-frequency regions is typically greater than that in low-frequency regions.
  • Figure 3: Overview of our method. In the first stage, we extract LaRE in the latent space through single-step reconstruction. In the second stage, to exploit LaRE, we propose the Error-guided Feature Refinement Module, which consists of the Error-guided spatial refinement module and the Error-guided Channel Refinement module. From both spatial and channel perspectives, LaRE is used to enhance the discriminativeness of the image feature for generated image detection.
  • Figure 4: Results of cross-validation on different training and testing subsets. For each generator, we train a model and test it on all 8 generators. For both DIREwang2023dire and our method. accuracy (ACC) and average precision(AP) are reported.
  • Figure 5: Trade-off between detection performance and feature extraction cost. When e=4, the model achieves the best trade-off.
  • ...and 2 more figures