Table of Contents
Fetching ...

HRR: Hierarchical Retrospection Refinement for Generated Image Detection

Peipei Yuan, Zijing Xie, Shuo Ye, Hong Chen, Yulong Wang

TL;DR

This paper tackles the challenge of reliably detecting generated images across diverse generative models and image scales. It introduces HRR, a diffusion-model-based framework with two core components: Multi-scale Style Retrospection (MSR) to produce scale-aware, style-robust features, and Additive Feature Refinement (AFR) to sparsely refine features via a correntropy-based loss. By combining KL divergence regularization and additive sparse optimization, HRR improves cross-generator generalization and scale-invariance, achieving state-of-the-art results on DIRE, ForenSynths, and cocoFake, with ablations confirming the complementary benefits of MSR and AFR. The approach offers a practical pathway to robust detection in real-world settings where manipulated imagery spans multiple generators and varying resolutions, and it provides insights into multi-scale, style-agnostic representations for forensic tasks.

Abstract

Generative artificial intelligence holds significant potential for abuse, and generative image detection has become a key focus of research. However, existing methods primarily focused on detecting a specific generative model and emphasizing the localization of synthetic regions, while neglecting the interference caused by image size and style on model learning. Our goal is to reach a fundamental conclusion: Is the image real or generated? To this end, we propose a diffusion model-based generative image detection framework termed Hierarchical Retrospection Refinement~(HRR). It designs a multi-scale style retrospection module that encourages the model to generate detailed and realistic multi-scale representations, while alleviating the learning biases introduced by dataset styles and generative models. Additionally, based on the principle of correntropy sparse additive machine, a feature refinement module is designed to reduce the impact of redundant features on learning and capture the intrinsic structure and patterns of the data, thereby improving the model's generalization ability. Extensive experiments demonstrate the HRR framework consistently delivers significant performance improvements, outperforming state-of-the-art methods in generated image detection task.

HRR: Hierarchical Retrospection Refinement for Generated Image Detection

TL;DR

This paper tackles the challenge of reliably detecting generated images across diverse generative models and image scales. It introduces HRR, a diffusion-model-based framework with two core components: Multi-scale Style Retrospection (MSR) to produce scale-aware, style-robust features, and Additive Feature Refinement (AFR) to sparsely refine features via a correntropy-based loss. By combining KL divergence regularization and additive sparse optimization, HRR improves cross-generator generalization and scale-invariance, achieving state-of-the-art results on DIRE, ForenSynths, and cocoFake, with ablations confirming the complementary benefits of MSR and AFR. The approach offers a practical pathway to robust detection in real-world settings where manipulated imagery spans multiple generators and varying resolutions, and it provides insights into multi-scale, style-agnostic representations for forensic tasks.

Abstract

Generative artificial intelligence holds significant potential for abuse, and generative image detection has become a key focus of research. However, existing methods primarily focused on detecting a specific generative model and emphasizing the localization of synthetic regions, while neglecting the interference caused by image size and style on model learning. Our goal is to reach a fundamental conclusion: Is the image real or generated? To this end, we propose a diffusion model-based generative image detection framework termed Hierarchical Retrospection Refinement~(HRR). It designs a multi-scale style retrospection module that encourages the model to generate detailed and realistic multi-scale representations, while alleviating the learning biases introduced by dataset styles and generative models. Additionally, based on the principle of correntropy sparse additive machine, a feature refinement module is designed to reduce the impact of redundant features on learning and capture the intrinsic structure and patterns of the data, thereby improving the model's generalization ability. Extensive experiments demonstrate the HRR framework consistently delivers significant performance improvements, outperforming state-of-the-art methods in generated image detection task.

Paper Structure

This paper contains 21 sections, 28 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: The generalization ability of generative image detection models is poor in multi-scale and cross-generator scenarios. In the figure, the left side represents the training dataset, where ProGAN is used as the generative model, while the right side shows the test dataset, along with the confidence scores for real or fake classifications.
  • Figure 2: Overview of HRR, it consists of two core modules: Multi-scale Style Retrospection (MSR) and Additive Feature Refinement (AFR).
  • Figure 3: Evaluation of the hyper-parameter $\gamma$ of KL-Loss and the regularization parameter $\lambda$. The horizontal axis represents the value of the hyperparameter $\gamma$ or $\lambda$, while the vertical axis indicates model accuracy.
  • Figure 4: CAM visualization. In each subfigure, the first row represents the original image, and the second to third rows represent the results of the baseline and our method, respectively.