Table of Contents
Fetching ...

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

Weiheng Zhu, Gang Cao, Jing Liu, Lifang Yu, Shaowei Weng

TL;DR

AI-generated image detectors face vulnerability to adversarial manipulation, and evaluating cross-model transferability is essential for security. DuFIA introduces a dual-domain feature importance attack that jointly exploits spatial and frequency perturbations to guide a mid-layer feature loss, improving cross-detector transferability without excessive perceptual distortion. Through extensive experiments on a wide range of detectors and generators, DuFIA achieves superior transferability and robustness to common post-processing, outperforming state-of-the-art attacks. The work provides a practical framework for antiforensics evaluation and offers code for reproducibility.

Abstract

Recent AI-generated image (AIGI) detectors achieve impressive accuracy under clean condition. In view of antiforensics, it is significant to develop advanced adversarial attacks for evaluating the security of such detectors, which remains unexplored sufficiently. This letter proposes a Dual-domain Feature Importance Attack (DuFIA) scheme to invalidate AIGI detectors to some extent. Forensically important features are captured by the spatially interpolated gradient and frequency-aware perturbation. The adversarial transferability is enhanced by jointly modeling spatial and frequency-domain feature importances, which are fused to guide the optimization-based adversarial example generation. Extensive experiments across various AIGI detectors verify the cross-model transferability, transparency and robustness of DuFIA.

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

TL;DR

AI-generated image detectors face vulnerability to adversarial manipulation, and evaluating cross-model transferability is essential for security. DuFIA introduces a dual-domain feature importance attack that jointly exploits spatial and frequency perturbations to guide a mid-layer feature loss, improving cross-detector transferability without excessive perceptual distortion. Through extensive experiments on a wide range of detectors and generators, DuFIA achieves superior transferability and robustness to common post-processing, outperforming state-of-the-art attacks. The work provides a practical framework for antiforensics evaluation and offers code for reproducibility.

Abstract

Recent AI-generated image (AIGI) detectors achieve impressive accuracy under clean condition. In view of antiforensics, it is significant to develop advanced adversarial attacks for evaluating the security of such detectors, which remains unexplored sufficiently. This letter proposes a Dual-domain Feature Importance Attack (DuFIA) scheme to invalidate AIGI detectors to some extent. Forensically important features are captured by the spatially interpolated gradient and frequency-aware perturbation. The adversarial transferability is enhanced by jointly modeling spatial and frequency-domain feature importances, which are fused to guide the optimization-based adversarial example generation. Extensive experiments across various AIGI detectors verify the cross-model transferability, transparency and robustness of DuFIA.

Paper Structure

This paper contains 19 sections, 11 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Detailed illustration of proposed DuFIA. A original image, after undergoing spatial and frequency domain perturbations, is fed into a source detector, and the dual feature importance at an intermediate layer is obtained via backpropagation. The feature importance is then multiplied element-wise with the same intermediate layer feature map of the adversarial example, producing a loss that guides the generation of the adversarial sample in the next iteration. $\odot$, $\oplus$ are the operations of element-wise product and addition, respectively.
  • Figure 2: Spectrum saliency maps (averaged over all images from the CycleGAN dataset ojha2023towards) for different AIGI detectors. (a-d) the results of the raw images for four different detectors. (e) the result of the images with frequency perturbation (FP) for UFD ojha2023towards.