Table of Contents
Fetching ...

Detecting AI-Generated Forgeries via Iterative Manifold Deviation Amplification

Jiangling Zhang, Shuxuan Gao, Bofan Liu, Siqiang Feng, Jirui Huang, Yaxiong Chen, Ziyu Chen

TL;DR

The Iterative Forgery Amplifier Network (IFA-Net), which shifts from learning"what is fake" to modeling"what is real", achieves an average improvement of 6.5% in IoU and 8.1% in F1-score over the second-best method, while demonstrating strong generalization to traditional manipulation types.

Abstract

The proliferation of highly realistic AI-generated images poses critical challenges for digital forensics, demanding precise pixel-level localization of manipulated regions. Existing methods predominantly learn discriminative patterns of specific forgeries and often struggle with novel manipulations as editing techniques continue to evolve. We propose the Iterative Forgery Amplifier Network (IFA-Net), which shifts from learning "what is fake" to modeling "what is real". Grounded in the principle that all manipulations deviate from the natural image manifold, IFA-Net leverages a frozen Masked Autoencoder (MAE) pretrained on real images as a universal realness prior. Our framework operates through a two-stage closed-loop process: an initial Dual-Stream Segmentation Network (DSSN) fuses the original image with MAE reconstruction residuals for coarse localization, followed by a Task-Adaptive Prior Injection (TAPI) module that converts this coarse prediction into guiding prompts to steer the MAE decoder and amplify reconstruction failures in suspicious regions for precise refinement. Extensive experiments on four diffusion-based inpainting benchmarks show that IFA-Net achieves an average improvement of 6.5% in IoU and 8.1% in F1-score over the second-best method, while demonstrating strong generalization to traditional manipulation types.

Detecting AI-Generated Forgeries via Iterative Manifold Deviation Amplification

TL;DR

The Iterative Forgery Amplifier Network (IFA-Net), which shifts from learning"what is fake" to modeling"what is real", achieves an average improvement of 6.5% in IoU and 8.1% in F1-score over the second-best method, while demonstrating strong generalization to traditional manipulation types.

Abstract

The proliferation of highly realistic AI-generated images poses critical challenges for digital forensics, demanding precise pixel-level localization of manipulated regions. Existing methods predominantly learn discriminative patterns of specific forgeries and often struggle with novel manipulations as editing techniques continue to evolve. We propose the Iterative Forgery Amplifier Network (IFA-Net), which shifts from learning "what is fake" to modeling "what is real". Grounded in the principle that all manipulations deviate from the natural image manifold, IFA-Net leverages a frozen Masked Autoencoder (MAE) pretrained on real images as a universal realness prior. Our framework operates through a two-stage closed-loop process: an initial Dual-Stream Segmentation Network (DSSN) fuses the original image with MAE reconstruction residuals for coarse localization, followed by a Task-Adaptive Prior Injection (TAPI) module that converts this coarse prediction into guiding prompts to steer the MAE decoder and amplify reconstruction failures in suspicious regions for precise refinement. Extensive experiments on four diffusion-based inpainting benchmarks show that IFA-Net achieves an average improvement of 6.5% in IoU and 8.1% in F1-score over the second-best method, while demonstrating strong generalization to traditional manipulation types.
Paper Structure (20 sections, 5 equations, 5 figures, 2 tables)

This paper contains 20 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Conceptual comparison of our IFA-Net against existing methods. Existing methods often leverage Diffusion Models for reconstruction in a single-pass, open-loop process that lacks task-adaptive guidance.Our method is grounded in a different prior: a Masked Autoencoder (MAE) pretrained on real images. We introduce a two-stage, closed-loop framework where the coarse prediction from Stage 1 guides a second, targeted reconstruction in Stage 2 for significant anomaly amplification.
  • Figure 2: Overall architecture of IFA-Net. The framework operates in two stages while sharing the same Dual-Stream Segmentation Network (DSSN). Stage 1 (blue arrows): The input image $x$ is processed by a fully frozen MAE (Encoder & Decoder) to produce an initial residual map $x_{\text{rec}}$. The DSSN then fuses $x$ and $x_{\text{rec}}$ to predict a coarse mask $M_{\text{crs}}$, supervised by $\mathcal{L}_{\text{crs}}$. Stage 2 (red arrows): The coarse mask $M_{\text{crs}}$ is encoded into Task-Adaptive Prompts ($T_p$), which modulate features from the frozen MAE Encoder via a FiLM layer. The modulated features are passed to a trainable MAE Decoder to generate an amplified residual map, which is then used by the shared DSSN to predict the final refined mask $M_{\text{ref}}$, supervised by $\mathcal{L}_{\text{ref}}$.
  • Figure 3: Visualization of anomaly amplification. We compare reconstructed images and residual maps from Stage 1 and Stage 2. In Stage 1, the frozen MAE can already produce coarse forgery cues—although weak and noisy, they indicate that the MAE indeed encodes a realness prior. In Stage 2, prompt-guided reconstruction forces the model to fail more strongly on forged regions, further amplifying these forgery cues and yielding a much cleaner and more salient residual map that better supports the final segmentation.
  • Figure 4: Qualitative comparison on OpenSDID. Examples are selected from five diffusion generators (SD1.5, SD2.1, SD3, SDXL, and Flux.1). IFA-Net produces cleaner and more accurate localization masks that align closely with the ground truth, whereas other methods often yield incomplete or fragmented detections.
  • Figure 5: Robustness analysis under Gaussian blur and Jpeg compression.