Table of Contents
Fetching ...

BENet: A Cross-domain Robust Network for Detecting Face Forgeries via Bias Expansion and Latent-space Attention

Weihua Liu, Jianhua Qiu, Said Boumaraf, Chaochao lin, Pan liyuan, Lin Li, Mohammed Bennamoun, Naoufel Werghi

TL;DR

BENet tackles deepfake detection under cross-domain shift by combining a bias expansion autoencoder with a multi-scale Latent-space Attention (LSA) and a cross-domain detector. The bias expansion module amplifies forgery cues while preserving real-face reconstructions, and the LSA module emphasizes latent-space inconsistencies across encoder–decoder scales; together they form a discriminative feature space for a binary classifier. Training uses a novel bias expansion loss $L_{be}$ alongside the standard cross-entropy $L_c$, with a balancing parameter $\lambda$ (optimal at 0.5), and inference includes a threshold-based cross-domain verification to handle unseen manipulations. Across intra- and cross-dataset evaluations on FF++, Celeb-DF, DFFD, and DFDC, BENet achieves state-of-the-art performance and demonstrates robustness to unseen perturbations, underscoring its practical potential for real-world deepfake defense.

Abstract

In response to the growing threat of deepfake technology, we introduce BENet, a Cross-Domain Robust Bias Expansion Network. BENet enhances the detection of fake faces by addressing limitations in current detectors related to variations across different types of fake face generation techniques, where ``cross-domain" refers to the diverse range of these deepfakes, each considered a separate domain. BENet's core feature is a bias expansion module based on autoencoders. This module maintains genuine facial features while enhancing differences in fake reconstructions, creating a reliable bias for detecting fake faces across various deepfake domains. We also introduce a Latent-Space Attention (LSA) module to capture inconsistencies related to fake faces at different scales, ensuring robust defense against advanced deepfake techniques. The enriched LSA feature maps are multiplied with the expanded bias to create a versatile feature space optimized for subtle forgeries detection. To improve its ability to detect fake faces from unknown sources, BENet integrates a cross-domain detector module that enhances recognition accuracy by verifying the facial domain during inference. We train our network end-to-end with a novel bias expansion loss, adopted for the first time, in face forgery detection. Extensive experiments covering both intra and cross-dataset demonstrate BENet's superiority over current state-of-the-art solutions.

BENet: A Cross-domain Robust Network for Detecting Face Forgeries via Bias Expansion and Latent-space Attention

TL;DR

BENet tackles deepfake detection under cross-domain shift by combining a bias expansion autoencoder with a multi-scale Latent-space Attention (LSA) and a cross-domain detector. The bias expansion module amplifies forgery cues while preserving real-face reconstructions, and the LSA module emphasizes latent-space inconsistencies across encoder–decoder scales; together they form a discriminative feature space for a binary classifier. Training uses a novel bias expansion loss alongside the standard cross-entropy , with a balancing parameter (optimal at 0.5), and inference includes a threshold-based cross-domain verification to handle unseen manipulations. Across intra- and cross-dataset evaluations on FF++, Celeb-DF, DFFD, and DFDC, BENet achieves state-of-the-art performance and demonstrates robustness to unseen perturbations, underscoring its practical potential for real-world deepfake defense.

Abstract

In response to the growing threat of deepfake technology, we introduce BENet, a Cross-Domain Robust Bias Expansion Network. BENet enhances the detection of fake faces by addressing limitations in current detectors related to variations across different types of fake face generation techniques, where ``cross-domain" refers to the diverse range of these deepfakes, each considered a separate domain. BENet's core feature is a bias expansion module based on autoencoders. This module maintains genuine facial features while enhancing differences in fake reconstructions, creating a reliable bias for detecting fake faces across various deepfake domains. We also introduce a Latent-Space Attention (LSA) module to capture inconsistencies related to fake faces at different scales, ensuring robust defense against advanced deepfake techniques. The enriched LSA feature maps are multiplied with the expanded bias to create a versatile feature space optimized for subtle forgeries detection. To improve its ability to detect fake faces from unknown sources, BENet integrates a cross-domain detector module that enhances recognition accuracy by verifying the facial domain during inference. We train our network end-to-end with a novel bias expansion loss, adopted for the first time, in face forgery detection. Extensive experiments covering both intra and cross-dataset demonstrate BENet's superiority over current state-of-the-art solutions.

Paper Structure

This paper contains 35 sections, 13 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of BENet Architecture. Three components play an important role in BENet: (A) a Bias expansion module for processing input images and amplifying forgery clues, (B) A Latent-space attention (LSA) module for capturing the latent feature variances across multiple scales between the encoder and decoder, and (C) a Cross-domain detector module for enhancing defense against unknown attacks. The learning process of BENet is optimized end-to-end with a newly designed bias expansion loss (\ref{['sec: 3.3']}).
  • Figure 2: Illustration of bias expansion, showcasing the role of $L_1$, $L_2$, and $L_3$ . (a) $L_1$ maintains consistency between input real faces and their reconstructions. (b) $L_2$ expands the difference between input fake faces and their reconstructions. (c) $L_3$ increases the difference between real and fake faces biases.
  • Figure 3: Overview of the LSA module. (a) The functional workflow of the LSA module; (b) The process for calculating latent-space attention maps (Per-channel calculation for simplicity).
  • Figure 4: Examples of perturbations at varying levels of severity. These perturbations are introduced in jiang2020deeperforensics and consist of: changes in saturation, adding block-wise distortions, changes in contrast, adding white Gaussian noise, blurring, pixelating, and applying video compression.
  • Figure 5: Robustness to various perturbation examples of different severity levels."Average" denotes the mean across all perturbations at each severity level. BENet is more robust than state-of-the-art approaches to all types of perturbations.
  • ...and 4 more figures