Table of Contents
Fetching ...

Learning to Discover Forgery Cues for Face Forgery Detection

Jiahe Tian, Peng Chen, Cai Yu, Xiaomeng Fu, Xi Wang, Jiao Dai, Jizhong Han

TL;DR

This work tackles the challenge of interpretable, pixel-level forgery cue localization in face forgery detection without relying on paired real-forged data. It introduces Forgery Cue Discovery (FoCus), which uses a Classification Attentive Regions Proposal (CARP) module and a Complementary Learning (CL) framework to generate robust, exploitable manipulation maps from unpaired faces by fusing RGB and Sobel edge cues. Empirically, FoCus improves multi-task detection models across five datasets and demonstrates strong in-dataset and cross-dataset generalization, along with improved interpretability and robustness of the cues. The approach broadens training data scalability for forgery detection and offers a practical path toward more explainable and generalizable detectors.

Abstract

Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision. This requirement restricts their applicability to unpaired faces and contradicts real-world scenarios. Moreover, the used comparison methods annotate all changed pixels, including noise introduced by compression and upsampling. Using such maps as supervision hinders the learning of exploitable cues and makes models prone to overfitting. To address these issues, we introduce a weakly supervised model in this paper, named Forgery Cue Discovery (FoCus), to locate forgery cues in unpaired faces. Unlike some detectors that claim to locate forged regions in attention maps, FoCus is designed to sidestep their shortcomings of capturing partial and inaccurate forgery cues. Specifically, we propose a classification attentive regions proposal module to locate forgery cues during classification and a complementary learning module to facilitate the learning of richer cues. The produced manipulation maps can serve as better supervision to enhance face forgery detectors. Visualization of the manipulation maps of the proposed FoCus exhibits superior interpretability and robustness compared to existing methods. Experiments on five datasets and four multi-task models demonstrate the effectiveness of FoCus in both in-dataset and cross-dataset evaluations.

Learning to Discover Forgery Cues for Face Forgery Detection

TL;DR

This work tackles the challenge of interpretable, pixel-level forgery cue localization in face forgery detection without relying on paired real-forged data. It introduces Forgery Cue Discovery (FoCus), which uses a Classification Attentive Regions Proposal (CARP) module and a Complementary Learning (CL) framework to generate robust, exploitable manipulation maps from unpaired faces by fusing RGB and Sobel edge cues. Empirically, FoCus improves multi-task detection models across five datasets and demonstrates strong in-dataset and cross-dataset generalization, along with improved interpretability and robustness of the cues. The approach broadens training data scalability for forgery detection and offers a practical path toward more explainable and generalizable detectors.

Abstract

Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision. This requirement restricts their applicability to unpaired faces and contradicts real-world scenarios. Moreover, the used comparison methods annotate all changed pixels, including noise introduced by compression and upsampling. Using such maps as supervision hinders the learning of exploitable cues and makes models prone to overfitting. To address these issues, we introduce a weakly supervised model in this paper, named Forgery Cue Discovery (FoCus), to locate forgery cues in unpaired faces. Unlike some detectors that claim to locate forged regions in attention maps, FoCus is designed to sidestep their shortcomings of capturing partial and inaccurate forgery cues. Specifically, we propose a classification attentive regions proposal module to locate forgery cues during classification and a complementary learning module to facilitate the learning of richer cues. The produced manipulation maps can serve as better supervision to enhance face forgery detectors. Visualization of the manipulation maps of the proposed FoCus exhibits superior interpretability and robustness compared to existing methods. Experiments on five datasets and four multi-task models demonstrate the effectiveness of FoCus in both in-dataset and cross-dataset evaluations.
Paper Structure (20 sections, 9 equations, 11 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 9 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: Existing manipulation map generation methods rely on comparing paired faces. The generated maps are often noisy when treating globally disturbed images, leading to poor interpretability. We propose FoCus to generate manipulation maps in a weakly supervised manner.
  • Figure 2: The pipeline of the proposed FoCus. We use ViT as the backbone to encode RGB and Sobel inputs to $\mathbf{z}_{\rm RGB}$ and $\mathbf{z}_{\rm Sobel}$. The Classification Attentive Regions Proposal module is devised to locate forgery cues in both modalities to $\boldsymbol{a}_{\rm RGB}$ and $\boldsymbol{a}_{\rm Sobel}$. The Complementary Learning module is devised to mine complementary nature between $\mathbf{z}_{\rm RGB}$ and $\mathbf{z}_{\rm Sobel}$, and then output a complementary mask $\mathbf{M}$ to fuse $\boldsymbol{a}_{\rm RGB}$ and $\boldsymbol{a}_{\rm Sobel}$ to $\boldsymbol{a}_{\rm fus}$ with Equation \ref{['fusion_eqn']}. $\boldsymbol{a}_{\rm fus}$ can serve as pixel-wise annotation for exploitable forgery cues.
  • Figure 3: Diagrams of the Complementary Learning block. The argmax operation is implemented by matrix production between hard Gumbel-Softmax logits and concatenated tokens. Best view in color.
  • Figure 4: The multi-task model for evaluating manipulation maps. Different manipulation maps are used as supervision for the dense head. We use the classification performance of the evaluation model to assess the exploitability of different manipulation maps.
  • Figure 5: The generated maps for fake faces in FF++(HQ). We provided maps interpolated through Bilinear and Nearest interpolation for better visualization.
  • ...and 6 more figures