LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Dat Nguyen; Nesryne Mejri; Inder Pal Singh; Polina Kuleshova; Marcella Astrid; Anis Kacem; Enjie Ghorbel; Djamila Aouada

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, Djamila Aouada

TL;DR

LAA-Net addresses the challenge of high-quality deepfake detection and cross-manipulation generalization by introducing an explicit, fine-grained attention mechanism anchored on vulnerable blending points, coupled with an Enhanced Feature Pyramid Network (E-FPN) to preserve and propagate low-level cues. The model is trained using real data only, leveraging blending-based pseudo-fake synthesis to generate heatmap and self-consistency targets within a three-branch multi-task framework. Empirical results on FF++ and cross-dataset benchmarks (CDF2, DFD, DFDC, DFW) show state-of-the-art AUC and AP, with robust performance to several perturbations, while also revealing sensitivity to structural noise. Together, these components enable more precise localization of artifacts and better generalization to unseen deepfakes, with future work aimed at improving noise robustness and incorporating temporal information.

Abstract

This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code is available at https://github.com/10Ring/LAA-Net.

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

TL;DR

Abstract

Paper Structure (23 sections, 9 equations, 10 figures, 7 tables)

This paper contains 23 sections, 9 equations, 10 figures, 7 tables.

Introduction
Related Works: Attention-based Deepfake Detection
Localized Artifact Attention Network (LAA-Net)
Explicit Attention to Vulnerable Points
Blending-based Data Synthesis
Proposed Multi-task Learning Framework
Heatmap Branch.
Self-consistency Branch.
Training Strategy.
Enhanced Feature Pyramid Network (E-FPN)
Experiments
Experimental Settings
Comparison with State-of-the-art
Ablation Study
E-FPN versus Traditional FPN
...and 8 more sections

Figures (10)

Figure 1: Comparison of LAA-Net ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) with respect to existing methods, namely, Multi-attentional ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) multi-attentional, SBI ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) sbi, Xception ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) ff++, RECCE ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) ete_recons, CADDM ($\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) caddm, using (a) the AUC performance with respect to different ranges of Mask SSIM, and (b) its associated boxplots. *The results were obtained using the official source codes pretrained on FF+ ff++ and testing on Celeb-DFv2 celeb_df. Figure best viewed in colors.
Figure 2: Overview of the proposed LAA-Net approach: it is formed by two components, namely, (1) an explicit attention mechanism based on a multi-task learning framework composed of three branches, i.e., the binary classification branch, the heatmap branch, and the self-consistency branch. The heatmap and self-consistency ground-truth data are generated based on the detected vulnerable points, and (2) an Enhanced Feature Pyramid Networks (E-FPN) that aggregates multi-scale features.
Figure 3: Extraction of the vulnerable points.
Figure 4: Architecture of the proposed Enhanced Feature Pyramid Network (E-FPN).
Figure 5: Grad-CAM gradCAM visualization on different types of manipulation from FF++ ff++. LAA-Net is compared to SBI sbi, Xception ff++, and MAT multi-attentional.
...and 5 more figures

Theorems & Definitions (1)

Definition 1

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

TL;DR

Abstract

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (1)