Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning
Sarwar Khan
TL;DR
The paper tackles the vulnerability of deepfake detectors to adversarial perturbations by introducing Adversarial Feature Similarity Learning (AFSL), a three-part objective that separates real vs fake features, aligns adversarial and clean embeddings, and regularizes real-vs-fake similarity. The final objective $L_{afsl} = L_{dcl} + \beta_1 L_{asl} + \beta_2 L_{srl}$ with $\beta_1=1$ and $\beta_2=0.1$ jointly enhances discrimination and robustness. AFSL demonstrably improves robustness across FF++, FaceShifter, and DeeperForensics for both frame-based and video-based detectors, outperforming standard adversarial training methods such as AT and TRADES, and maintaining performance under common distortions. The work suggests strong practical impact for deploying deepfake detectors in adversarial settings and points to future directions like self-supervised defenses.
Abstract
Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods. However, the widespread availability of deepfakes has given rise to a new challenge in the form of adversarial attacks. Adversaries can manipulate deepfake videos with small, imperceptible perturbations that can deceive the detection models into producing incorrect outputs. To tackle this critical issue, we introduce Adversarial Feature Similarity Learning (AFSL), which integrates three fundamental deep feature learning paradigms. By optimizing the similarity between samples and weight vectors, our approach aims to distinguish between real and fake instances. Additionally, we aim to maximize the similarity between both adversarially perturbed examples and unperturbed examples, regardless of their real or fake nature. Moreover, we introduce a regularization technique that maximizes the dissimilarity between real and fake samples, ensuring a clear separation between these two categories. With extensive experiments on popular deepfake datasets, including FaceForensics++, FaceShifter, and DeeperForensics, the proposed method outperforms other standard adversarial training-based defense methods significantly. This further demonstrates the effectiveness of our approach to protecting deepfake detectors from adversarial attacks.
