Table of Contents
Fetching ...

Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations

Decheng Liu, Zongqi Wang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

TL;DR

This work investigates racial bias in deep face forgery detection and introduces a comprehensive fairness-driven benchmark. It presents the Fair Forgery Detection (FairFD) dataset, tailored fairness metrics (Approach Averaged Metric and Utility Regularized Metric), and a post-processing pruning method (BPFA) to enhance fairness without retraining. Extensive experiments across 12 detectors reveal pervasive racial bias, while BPFA, especially combined with a strong baseline like SPSL, achieves new state-of-the-art fairness without sacrificing utility. The study also analyzes dataset construction, thresholding, and feature-space behavior to offer actionable insights for developing fairer forgery detectors and establishing standardized evaluation protocols.

Abstract

Due to the successful development of deep image generation technology, forgery detection plays a more important role in social and economic security. Racial bias has not been explored thoroughly in the deep forgery detection field. In the paper, we first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods. Different from existing forgery detection datasets, the self-constructed FairFD dataset contains a balanced racial ratio and diverse forgery generation images with the largest-scale subjects. Additionally, we identify the problems with naive fairness metrics when benchmarking forgery detection models. To comprehensively evaluate fairness, we design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results. We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates. Extensive experiments conducted with 12 representative forgery detection models demonstrate the value of the proposed dataset and the reasonability of the designed fairness metrics. By applying the BPFA to the existing fairest detector, we achieve a new SOTA. Furthermore, we conduct more in-depth analyses to offer more insights to inspire researchers in the community.

Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations

TL;DR

This work investigates racial bias in deep face forgery detection and introduces a comprehensive fairness-driven benchmark. It presents the Fair Forgery Detection (FairFD) dataset, tailored fairness metrics (Approach Averaged Metric and Utility Regularized Metric), and a post-processing pruning method (BPFA) to enhance fairness without retraining. Extensive experiments across 12 detectors reveal pervasive racial bias, while BPFA, especially combined with a strong baseline like SPSL, achieves new state-of-the-art fairness without sacrificing utility. The study also analyzes dataset construction, thresholding, and feature-space behavior to offer actionable insights for developing fairer forgery detectors and establishing standardized evaluation protocols.

Abstract

Due to the successful development of deep image generation technology, forgery detection plays a more important role in social and economic security. Racial bias has not been explored thoroughly in the deep forgery detection field. In the paper, we first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods. Different from existing forgery detection datasets, the self-constructed FairFD dataset contains a balanced racial ratio and diverse forgery generation images with the largest-scale subjects. Additionally, we identify the problems with naive fairness metrics when benchmarking forgery detection models. To comprehensively evaluate fairness, we design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results. We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates. Extensive experiments conducted with 12 representative forgery detection models demonstrate the value of the proposed dataset and the reasonability of the designed fairness metrics. By applying the BPFA to the existing fairest detector, we achieve a new SOTA. Furthermore, we conduct more in-depth analyses to offer more insights to inspire researchers in the community.
Paper Structure (41 sections, 11 equations, 9 figures, 14 tables)

This paper contains 41 sections, 11 equations, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Workflow of fairness evaluation in forgery detection. We first construct an evaluation dataset containing a large number of subjects, diverse forgery approaches, and racial balance. Subsequently, we obtain the test results of the forgery detector on each race and forgery method. Finally, we comprehensively evaluate the detector using three sets of 12 fairness metrics in total.
  • Figure 2: A face forgery detection model exhibits different biases for different forgery approaches.
  • Figure 3: Detailed utility (AUC) for diverse races, forgery approaches, detectors.
  • Figure 4: Fairness (STD) for different forgery approaches.
  • Figure 5: Evaluation with TPR and TNR for for each subject on FF++ subset.
  • ...and 4 more figures