Table of Contents
Fetching ...

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, Djamila Aouada

TL;DR

LAA-Net addresses the challenge of high-quality deepfake detection and cross-manipulation generalization by introducing an explicit, fine-grained attention mechanism anchored on vulnerable blending points, coupled with an Enhanced Feature Pyramid Network (E-FPN) to preserve and propagate low-level cues. The model is trained using real data only, leveraging blending-based pseudo-fake synthesis to generate heatmap and self-consistency targets within a three-branch multi-task framework. Empirical results on FF++ and cross-dataset benchmarks (CDF2, DFD, DFDC, DFW) show state-of-the-art AUC and AP, with robust performance to several perturbations, while also revealing sensitivity to structural noise. Together, these components enable more precise localization of artifacts and better generalization to unseen deepfakes, with future work aimed at improving noise robustness and incorporating temporal information.

Abstract

This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code is available at https://github.com/10Ring/LAA-Net.

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

TL;DR

LAA-Net addresses the challenge of high-quality deepfake detection and cross-manipulation generalization by introducing an explicit, fine-grained attention mechanism anchored on vulnerable blending points, coupled with an Enhanced Feature Pyramid Network (E-FPN) to preserve and propagate low-level cues. The model is trained using real data only, leveraging blending-based pseudo-fake synthesis to generate heatmap and self-consistency targets within a three-branch multi-task framework. Empirical results on FF++ and cross-dataset benchmarks (CDF2, DFD, DFDC, DFW) show state-of-the-art AUC and AP, with robust performance to several perturbations, while also revealing sensitivity to structural noise. Together, these components enable more precise localization of artifacts and better generalization to unseen deepfakes, with future work aimed at improving noise robustness and incorporating temporal information.

Abstract

This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code is available at https://github.com/10Ring/LAA-Net.
Paper Structure (23 sections, 9 equations, 10 figures, 7 tables)

This paper contains 23 sections, 9 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Comparison of LAA-Net ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) with respect to existing methods, namely, Multi-attentional ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) multi-attentional, SBI ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) sbi, Xception ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) ff++, RECCE ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) ete_recons, CADDM ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) caddm, using (a) the AUC performance with respect to different ranges of Mask SSIM, and (b) its associated boxplots. *The results were obtained using the official source codes pretrained on FF+ ff++ and testing on Celeb-DFv2 celeb_df. Figure best viewed in colors.
  • Figure 2: Overview of the proposed LAA-Net approach: it is formed by two components, namely, (1) an explicit attention mechanism based on a multi-task learning framework composed of three branches, i.e., the binary classification branch, the heatmap branch, and the self-consistency branch. The heatmap and self-consistency ground-truth data are generated based on the detected vulnerable points, and (2) an Enhanced Feature Pyramid Networks (E-FPN) that aggregates multi-scale features.
  • Figure 3: Extraction of the vulnerable points.
  • Figure 4: Architecture of the proposed Enhanced Feature Pyramid Network (E-FPN).
  • Figure 5: Grad-CAM gradCAM visualization on different types of manipulation from FF++ ff++. LAA-Net is compared to SBI sbi, Xception ff++, and MAT multi-attentional.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1