Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations
Decheng Liu, Zongqi Wang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao
TL;DR
This work investigates racial bias in deep face forgery detection and introduces a comprehensive fairness-driven benchmark. It presents the Fair Forgery Detection (FairFD) dataset, tailored fairness metrics (Approach Averaged Metric and Utility Regularized Metric), and a post-processing pruning method (BPFA) to enhance fairness without retraining. Extensive experiments across 12 detectors reveal pervasive racial bias, while BPFA, especially combined with a strong baseline like SPSL, achieves new state-of-the-art fairness without sacrificing utility. The study also analyzes dataset construction, thresholding, and feature-space behavior to offer actionable insights for developing fairer forgery detectors and establishing standardized evaluation protocols.
Abstract
Due to the successful development of deep image generation technology, forgery detection plays a more important role in social and economic security. Racial bias has not been explored thoroughly in the deep forgery detection field. In the paper, we first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods. Different from existing forgery detection datasets, the self-constructed FairFD dataset contains a balanced racial ratio and diverse forgery generation images with the largest-scale subjects. Additionally, we identify the problems with naive fairness metrics when benchmarking forgery detection models. To comprehensively evaluate fairness, we design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results. We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates. Extensive experiments conducted with 12 representative forgery detection models demonstrate the value of the proposed dataset and the reasonability of the designed fairness metrics. By applying the BPFA to the existing fairest detector, we achieve a new SOTA. Furthermore, we conduct more in-depth analyses to offer more insights to inspire researchers in the community.
