Table of Contents
Fetching ...

Towards Robust GAN-generated Image Detection: a Multi-view Completion Representation

Chi Liu, Tianqing Zhu, Sheng Shen, Wanlei Zhou

TL;DR

GAN-generated image detection struggles with generalization due to overreliance on unstable frequency artifacts. The authors introduce MCCL, a framework that combines multi-view image completion with cross-view classification to learn frequency-irrelevant, robust features from real images. It uses three incomplete views (Masked Image Modeling, Gray-to-RGB, Edge-to-RGB) and combines intra-view enhancements (multi-scale features and low-pass residual attention) with adaptive inter-view fusion to detect fakes. Extensive experiments across resolutions, GAN types, and perturbations demonstrate improved generalization and robustness, highlighting spectral alignment and diverse view information as key for practical deepfake detection.

Abstract

GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method's improved effectiveness, generalization, and robustness over various baselines.

Towards Robust GAN-generated Image Detection: a Multi-view Completion Representation

TL;DR

GAN-generated image detection struggles with generalization due to overreliance on unstable frequency artifacts. The authors introduce MCCL, a framework that combines multi-view image completion with cross-view classification to learn frequency-irrelevant, robust features from real images. It uses three incomplete views (Masked Image Modeling, Gray-to-RGB, Edge-to-RGB) and combines intra-view enhancements (multi-scale features and low-pass residual attention) with adaptive inter-view fusion to detect fakes. Extensive experiments across resolutions, GAN types, and perturbations demonstrate improved generalization and robustness, highlighting spectral alignment and diverse view information as key for practical deepfake detection.

Abstract

GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method's improved effectiveness, generalization, and robustness over various baselines.
Paper Structure (31 sections, 10 equations, 5 figures, 5 tables)

This paper contains 31 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Instead of learning GAN-specific features directly from fake images which may lead to overfitting, our framework incorporates multi-view completion and classification to model diverse distributional discrepancies between real and fake images, which can generalize to unknown fake patterns. $\mathcal{R}$: Restorer; $\mathcal{C}$: Classifier.
  • Figure 2: The overview of our framework (white box). Several restorers first learn different distributions of real images via multi-view completion learning. Then for each view, a classifier captures the view-specific distributional discrepancy between real and fake images via intra-view learning. The low-pass residual-guided attention and multi-scale feature concatenation modules are devised to strengthen intra-view learning (orange box). All base classifiers are finally fused to perform inter-view learning for robust detection (yellow box).
  • Figure 3: Three incomplete views selected for completion learning.
  • Figure 4: The visualization of real and different GAN-generated and perturbed fake samples (the 1st row) and the average FFT spectra before (the 2nd row) and after (the 3rd row) the Edge-to-RGB completion.
  • Figure 5: The spectral distributions of real images and fake images generated by different GANs before and after completion.