Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

Sarwar Khan

Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

Sarwar Khan

TL;DR

The paper tackles the vulnerability of deepfake detectors to adversarial perturbations by introducing Adversarial Feature Similarity Learning (AFSL), a three-part objective that separates real vs fake features, aligns adversarial and clean embeddings, and regularizes real-vs-fake similarity. The final objective $L_{afsl} = L_{dcl} + \beta_1 L_{asl} + \beta_2 L_{srl}$ with $\beta_1=1$ and $\beta_2=0.1$ jointly enhances discrimination and robustness. AFSL demonstrably improves robustness across FF++, FaceShifter, and DeeperForensics for both frame-based and video-based detectors, outperforming standard adversarial training methods such as AT and TRADES, and maintaining performance under common distortions. The work suggests strong practical impact for deploying deepfake detectors in adversarial settings and points to future directions like self-supervised defenses.

Abstract

Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods. However, the widespread availability of deepfakes has given rise to a new challenge in the form of adversarial attacks. Adversaries can manipulate deepfake videos with small, imperceptible perturbations that can deceive the detection models into producing incorrect outputs. To tackle this critical issue, we introduce Adversarial Feature Similarity Learning (AFSL), which integrates three fundamental deep feature learning paradigms. By optimizing the similarity between samples and weight vectors, our approach aims to distinguish between real and fake instances. Additionally, we aim to maximize the similarity between both adversarially perturbed examples and unperturbed examples, regardless of their real or fake nature. Moreover, we introduce a regularization technique that maximizes the dissimilarity between real and fake samples, ensuring a clear separation between these two categories. With extensive experiments on popular deepfake datasets, including FaceForensics++, FaceShifter, and DeeperForensics, the proposed method outperforms other standard adversarial training-based defense methods significantly. This further demonstrates the effectiveness of our approach to protecting deepfake detectors from adversarial attacks.

Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

TL;DR

with

and

jointly enhances discrimination and robustness. AFSL demonstrably improves robustness across FF++, FaceShifter, and DeeperForensics for both frame-based and video-based detectors, outperforming standard adversarial training methods such as AT and TRADES, and maintaining performance under common distortions. The work suggests strong practical impact for deploying deepfake detectors in adversarial settings and points to future directions like self-supervised defenses.

Abstract

Paper Structure (19 sections, 4 equations, 2 figures, 6 tables)

This paper contains 19 sections, 4 equations, 2 figures, 6 tables.

Introduction
Related work
Deepfake Creation and Detection
Adversarial Examples
Adversarial Feature Similarity Learning
Overview
Deepfake Classification Loss
Adversarial Similarity Loss
Similarity Regularization Loss
Final Loss Function
Experimental Description
Implementation Details
Victim Models: Deepfake Detectors
Robust Cross-Manipulation Generalization
Evaluation on Frame-based Detectors
...and 4 more sections

Figures (2)

Figure 1: Framework for adversarial feature similarity learning. First, we select a pair of real and deepfake samples and create adversarial perturbation for the corresponding inputs. Then, we generate the features of real, fake, and their adversarial samples. Finally, through the proposed loss function, the model can learn a better representation to separate real samples from fake ones along with their adversarial counterparts, where the backbone $f_{\theta}$ is from the deepfake detector.
Figure 2: Robustness to unseen distortions: Video level AUC scores (%) varying with the severity level of different distortions. Average is the mean value at each severity level.

Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

TL;DR

Abstract

Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)