Data-Driven Fairness Generalization for Deepfake Detection
Uzoamaka Ezeakunne, Chrisantus Eze, Xiuwen Liu
TL;DR
The paper addresses fairness generalization in deepfake detection, where models exhibit demographic biases and struggle on unseen data. It introduces a data-centric framework that uses synthetic self-balanced images (SBI), a multi-task architecture with dual heads for detection and demographic prediction, and Sharpness-Aware Minimization (SAM) to encourage robust generalization. The approach optimizes a combined loss that balances accuracy and fairness across demographic groups while maintaining balanced real/fake data. Across intra-dataset and cross-dataset experiments, the method achieves comparable detection performance to baselines but substantially reduces demographic disparities, demonstrating the potential of synthetic data for fairness generalization in deepfake detection.
Abstract
Despite the progress made in deepfake detection research, recent studies have shown that biases in the training data for these detectors can result in varying levels of performance across different demographic groups, such as race and gender. These disparities can lead to certain groups being unfairly targeted or excluded. Traditional methods often rely on fair loss functions to address these issues, but they under-perform when applied to unseen datasets, hence, fairness generalization remains a challenge. In this work, we propose a data-driven framework for tackling the fairness generalization problem in deepfake detection by leveraging synthetic datasets and model optimization. Our approach focuses on generating and utilizing synthetic data to enhance fairness across diverse demographic groups. By creating a diverse set of synthetic samples that represent various demographic groups, we ensure that our model is trained on a balanced and representative dataset. This approach allows us to generalize fairness more effectively across different domains. We employ a comprehensive strategy that leverages synthetic data, a loss sharpness-aware optimization pipeline, and a multi-task learning framework to create a more equitable training environment, which helps maintain fairness across both intra-dataset and cross-dataset evaluations. Extensive experiments on benchmark deepfake detection datasets demonstrate the efficacy of our approach, surpassing state-of-the-art approaches in preserving fairness during cross-dataset evaluation. Our results highlight the potential of synthetic datasets in achieving fairness generalization, providing a robust solution for the challenges faced in deepfake detection.
