GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Chih-Chung Hsu; Shao-Ning Chen; Mei-Hsuan Wu; Yi-Fang Wang; Chia-Ming Lee; Yi-Shiuan Chou

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Chih-Chung Hsu, Shao-Ning Chen, Mei-Hsuan Wu, Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

TL;DR

The paper tackles robust DeepFake video detection when face sequences are unreliable due to degradation or adversarial manipulation. It introduces GRACE, a Graph-Regularized Attentive Convolutional Entanglement framework that fuses spatiotemporal features via Feature Entanglement, processes them with a Graph Convolutional Network augmented by Graph Laplacian Smoothing, and enforces sparsity to suppress noisy inputs. Key contributions include the FE mechanism $X_{FE}=X X^T$, integration of GLSPR into GCN propagation, and an $ ext{L}_1$ sparsity objective, with extensive experiments on FF++, Celeb-DF, and DFDC showing state-of-the-art robustness under noisy conditions and adversarial attacks. The approach delivers practical significance for real-world multimedia forensics, and the authors provide code at https://github.com/ming053l/GRACE.

Abstract

As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE.

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

TL;DR

, integration of GLSPR into GCN propagation, and an

sparsity objective, with extensive experiments on FF++, Celeb-DF, and DFDC showing state-of-the-art robustness under noisy conditions and adversarial attacks. The approach delivers practical significance for real-world multimedia forensics, and the authors provide code at https://github.com/ming053l/GRACE.

Abstract

Paper Structure (20 sections, 16 equations, 5 figures, 6 tables)

This paper contains 20 sections, 16 equations, 5 figures, 6 tables.

Introduction
Introduction
Related Works
General DeepFake Video Detection
External Reference DeepFake Video Detection
Robust DeepFake Video Detection
Proposed Graph-Regularized Attentive Convolutional Entanglement
Overview of the Proposed Method
Feature Entanglement
GCN with Graph Laplacian Smoothing
Graph Convolutional Network
Graph Laplacian Smoothing Prior Regularization
Sparsity Constraint on Network Optimization
Experimental Results
Experimental Configuration
...and 5 more sections

Figures (5)

Figure 1: Example of the detected faces from two videos using RetinaFace (top) and Dlib (bottom). The sampled consecutive frames may not accurately detect faces, such noisy sequence may result in inconsistent spatial semantic features extracted by the feature extractor, which in turn makes practical application challenging.
Figure 2: Flowchart of the proposed GRACE for robust DeepFake video detection. In practice, face detection results are not always effective. Invalid faces may significantly outnumber valid faces, limiting the DeepFake detection methods. We leverage the proposed Feature Entanglement technique to embed the spatio-temporal features. Graph Laplacian Smoothing Prior Regularization and Sparsity Constraint to smooth and suppress these noisy nodes.
Figure 3: The performance comparison of the proposed GRACE and other state-of-the-art methods in terms of AUC under different masking ratios $m_r$ for (a) FF++ffplus, (b) DFDC dfdc, and (c) Celeb-DF celeb.
Figure 4: The validation accuracy curve evaluated on FF++ ffplus of the proposed GRACE with and without Graph Laplacian Smoothing Prior Regularization and Sparsity Constraint.
Figure 5: (a) An example of the noisy face sequence caused by PGD-like adversarial attack with $\epsilon=0.04$, $\alpha_\text{atk}=0.01$, and $s=10$, resulting in approximated $m_r=0.2$ condition, where mis-detected faces were replaced with black ones. (b) The corresponding face sequence without adversarial perturbation.

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

TL;DR

Abstract

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)