Table of Contents
Fetching ...

COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection

Cong Zhang, Honggang Qi, Shuhui Wang, Yuezun Li, Siwei Lyu

TL;DR

COMICS tackles the problem of detecting multiple forged faces in realistic images with complex backgrounds by proposing an end-to-end framework that jointly detects faces and their authenticity. It combines coarse-grained, proposal- and layer-aware contrastive learning with fine-grained, pixel-level intra- and inter-face supervision, augmented by a frequency-enhanced attention module. The approach is plug-and-play with common detectors and demonstrates strong gains on OpenForensics and FFIW, with extensive ablations confirming the contribution of each component. This work advances practical deepfake forensics by enabling fast, accurate multi-face forgery detection in the wild.

Abstract

DeepFakes have raised serious societal concerns, leading to a great surge in detection-based forensics methods in recent years. Face forgery recognition is a standard detection method that usually follows a two-phase pipeline. While those methods perform well in ideal experimental environment, they face challenges when dealing with DeepFakes in the wild involving complex background and multiple faces of varying sizes. Moreover, most face forgery recognition methods can only process one face at a time. One straightforward way to address this issue is to simultaneous process multi-face by integrating face extraction and forgery detection in an end-to-end fashion by adapting advanced object detection architectures. However, as these object detection architectures are designed to capture the discriminative features of different object categories rather than the subtle forgery traces among the faces, the direct adaptation suffers from limited representation ability. In this paper, we propose COMICS, an end-to-end framework for multi-face forgery detection. COMICS integrates face extraction and forgery detection in a seamless manner and adapts to advanced object detection architectures. The proposed bi-grained contrastive learning approach explores face forgery traces at both the coarse- and fine-grained levels. Specifically, coarse-grained level contrastive learning captures the discriminative features among positive and negative proposal pairs at multiple layers produced by the proposal generator, and fine-grained level contrastive learning captures the pixel-wise discrepancy between the forged and original areas of the same face and the pixel-wise content inconsistency among different faces. Extensive experiments on the OpenForensics and FFIW datasets demonstrate that our method outperforms other counterparts and shows great potential for being integrated into various architectures.

COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection

TL;DR

COMICS tackles the problem of detecting multiple forged faces in realistic images with complex backgrounds by proposing an end-to-end framework that jointly detects faces and their authenticity. It combines coarse-grained, proposal- and layer-aware contrastive learning with fine-grained, pixel-level intra- and inter-face supervision, augmented by a frequency-enhanced attention module. The approach is plug-and-play with common detectors and demonstrates strong gains on OpenForensics and FFIW, with extensive ablations confirming the contribution of each component. This work advances practical deepfake forensics by enabling fast, accurate multi-face forgery detection in the wild.

Abstract

DeepFakes have raised serious societal concerns, leading to a great surge in detection-based forensics methods in recent years. Face forgery recognition is a standard detection method that usually follows a two-phase pipeline. While those methods perform well in ideal experimental environment, they face challenges when dealing with DeepFakes in the wild involving complex background and multiple faces of varying sizes. Moreover, most face forgery recognition methods can only process one face at a time. One straightforward way to address this issue is to simultaneous process multi-face by integrating face extraction and forgery detection in an end-to-end fashion by adapting advanced object detection architectures. However, as these object detection architectures are designed to capture the discriminative features of different object categories rather than the subtle forgery traces among the faces, the direct adaptation suffers from limited representation ability. In this paper, we propose COMICS, an end-to-end framework for multi-face forgery detection. COMICS integrates face extraction and forgery detection in a seamless manner and adapts to advanced object detection architectures. The proposed bi-grained contrastive learning approach explores face forgery traces at both the coarse- and fine-grained levels. Specifically, coarse-grained level contrastive learning captures the discriminative features among positive and negative proposal pairs at multiple layers produced by the proposal generator, and fine-grained level contrastive learning captures the pixel-wise discrepancy between the forged and original areas of the same face and the pixel-wise content inconsistency among different faces. Extensive experiments on the OpenForensics and FFIW datasets demonstrate that our method outperforms other counterparts and shows great potential for being integrated into various architectures.
Paper Structure (13 sections, 9 equations, 10 figures, 8 tables)

This paper contains 13 sections, 9 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Face forgery recognition usually deals with images of a fixed-size single face that has been carefully cropped. However, real-world images often contain complex backgrounds and multiple faces of varied sizes, making multi-face forgery detection more challenging. Images with yellow and purple boxes indicates the real faces and fake faces, respectively.
  • Figure 2: Overview of the two-stage face forgery detection (left) and the proposed single-phase face forgery detection framework Contrastive Multi-FaceForensics (right).
  • Figure 3: Overview of the proposed bi-grained contrastive learning. Our method is designed on the single-stage architecture, containing the feature extractor, the proposal generator, and the mask predictor respectively. Specifically, we perform coarse-grained contrastive learning on a feature extractor with the guidance of a proposal generator to capture the forgery traces among different face proposals, and fine-grained contrastive learning on mask predictor by considering the relationship of pixels in the same face (Intra-face) or different faces (Inter-face).
  • Figure 4: Overview of the coarse-grained contrastive learning. The input image is first augmented into two views ($I_q,I_k$) and then contrastive learning is performed at the proposal levels on different scales of the feature extractor. Note that the feature elements (e.g., the yellow or purple blocks) correspond to real or fake faces given the proposal generator $\mathcal{P}$.
  • Figure 5: Overview of the fine-grained contrastive learning. In contrast to coarse-grained contrastive learning, it aims to learn the relationship among pixels instead of proposals. Given the feature maps of the predicted masks, we consider both intra-face and inter-face relations. Specifically, the intra-face relation aims to capture the inconsistency between the forged and the surrounding original area in the same face, while inter-face relation explores the pixel-wise discrepancy between different faces.
  • ...and 5 more figures