Towards Robust GAN-generated Image Detection: a Multi-view Completion Representation
Chi Liu, Tianqing Zhu, Sheng Shen, Wanlei Zhou
TL;DR
GAN-generated image detection struggles with generalization due to overreliance on unstable frequency artifacts. The authors introduce MCCL, a framework that combines multi-view image completion with cross-view classification to learn frequency-irrelevant, robust features from real images. It uses three incomplete views (Masked Image Modeling, Gray-to-RGB, Edge-to-RGB) and combines intra-view enhancements (multi-scale features and low-pass residual attention) with adaptive inter-view fusion to detect fakes. Extensive experiments across resolutions, GAN types, and perturbations demonstrate improved generalization and robustness, highlighting spectral alignment and diverse view information as key for practical deepfake detection.
Abstract
GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method's improved effectiveness, generalization, and robustness over various baselines.
