Preserving Fairness Generalization in Deepfake Detection

Li Lin; Xinan He; Yan Ju; Xin Wang; Feng Ding; Shu Hu

Preserving Fairness Generalization in Deepfake Detection

Li Lin, Xinan He, Yan Ju, Xin Wang, Feng Ding, Shu Hu

TL;DR

The paper tackles fairness generalization in deepfake detection, showing that intra-domain fairness improvements do not guarantee cross-domain fairness. It introduces a three-module framework—disentanglement learning to separate demographic and domain-agnostic forgery features, fair learning to combine these features for unbiased predictions, and optimization with loss flattening to improve generalization. A bi-level demographic-margin and domain-aware contrastive losses are used, with AdaIN fusion to form fair representations, and Sharpness-Aware Minimization to smooth the loss landscape. Experiments across FF++, DFDC, Celeb-DF, and DFD demonstrate improved cross-domain fairness while maintaining strong detection performance, and ablation studies confirm the contribution of each component. This approach advances practical deepfake defenses by ensuring more reliable behavior across diverse demographic groups and unseen forgery domains.

Abstract

Although effective deepfake detection models have been developed in recent years, recent studies have revealed that these models can result in unfair performance disparities among demographic groups, such as race and gender. This can lead to particular groups facing unfair targeting or exclusion from detection, potentially allowing misclassified deepfakes to manipulate public opinion and undermine trust in the model. The existing method for addressing this problem is providing a fair loss function. It shows good fairness performance for intra-domain evaluation but does not maintain fairness for cross-domain testing. This highlights the significance of fairness generalization in the fight against deepfakes. In this work, we propose the first method to address the fairness generalization problem in deepfake detection by simultaneously considering features, loss, and optimization aspects. Our method employs disentanglement learning to extract demographic and domain-agnostic forgery features, fusing them to encourage fair learning across a flattened loss landscape. Extensive experiments on prominent deepfake datasets demonstrate our method's effectiveness, surpassing state-of-the-art approaches in preserving fairness during cross-domain deepfake detection. The code is available at https://github.com/Purdue-M2/Fairness-Generalization

Preserving Fairness Generalization in Deepfake Detection

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 11 equations, 15 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 11 equations, 15 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Motivation
Method
Overview of Proposed Method
Exposing Demographic & Forgery Features
Fair Learning under Generalization
Joint Optimization
Experiment
Experimental Settings
Results
Ablation Study
Visualization
Conclusion
Related Work
...and 5 more sections

Key Result

Theorem 1

(locatello2019fairness) If $X$ is entangled with $Y$ and $D$, the use of a perfect classifier for $\hat{Y}$, i.e., $P(\hat{Y}|X)=P(Y|X)$, does not imply demographic parity, i.e., $P(\hat{Y}=y|D=\mathcal{J}_1)=P(\hat{Y}=y|D=\mathcal{J}_2)$, $\forall y\in\{0,1\}$, where 0 means real and 1 means fake.

Figures (15)

Figure 1: Comparison between our method and existing deepfake detection baselines. (Left) The Ori represents the conventional method without any fair characters. (Middle) The DAW-FDD ju2023improving is an intra-domain fair deepfake detection method. However, this method fails in cross-domain fair detection. (Right) Our method succeeds in achieving both intra-domain and cross-domain fair detection by exposing domain-agnostic forgery features and demographic features and then fusing them for fair learning across a flattened loss landscape.
Figure 2: Experimental results for Motivation. Testing fairness results (lower is better for all metrics) of deepfake detectors in intra-domain (Left, train and test: FF++) and cross-domain (Middle, train: FF++, test: DFD) detection. (Right) Visualization of loss landscape for DAW-FDD. The numerous local and global minima could cause the model to have poor generalization.
Figure 3: An overview of our proposed method. 1) For the disentanglement learning module, we utilize it to expose demographic and forgery features. 2) For the fair learning module, we fuse those two features for a fair classifier head $h$ and obtain the fair prediction using two-level fairness loss $\mathcal{L}_{fair}$. 3) For the optimization module, we flatten the loss landscape to further enhance fairness generalization.
Figure 4: (Left) Comparison of FPR on Intersectional subgroups. Models are trained on FF++ and tested on FF++, DFDC, Celeb-DF, and DFD. The subgroups not represented in Celeb-DF and DFD are inapplicable. (Right) The loss landscape visualization of our proposed method with (right) and without (left) flattening the loss landscape.
Figure 5: (Left) Grad-CAM visualization of Ori's (first row), DAW-FDD (second row), and ours (third row) on the intra-domain dataset (FF++), and cross-domain datasets (DFDC, Celeb-DF, and DFD). (Right) Visualization of the image (first column), DAW-FDD's features (second column), ours disentangled forgery (third column), content (fourth column), and demographic features (last column).
...and 10 more figures

Theorems & Definitions (1)

Theorem 1

Preserving Fairness Generalization in Deepfake Detection

TL;DR

Abstract

Preserving Fairness Generalization in Deepfake Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (1)