How Knowledge Distillation Mitigates the Synthetic Gap in Fair Face Recognition
Pedro C. Neto, Ivona Colakovic, Sašo Karakatič, Ana F. Sequeira
TL;DR
The paper tackles the retraction of real face-recognition datasets by exploring knowledge distillation (KD) from a real-data trained Teacher to smaller Student models trained on synthetic or mixed data. It presents an ethnicity-aware dataset merging strategy and evaluates multiple architectures and losses, showing that KD improves both accuracy and fairness and mitigates the performance gap between real and synthetic data. A 70% real and 30% synthetic mix often matches or surpass real-data KD performance, while also reducing bias across ethnic groups, making synthetic data training more viable and privacy-preserving. The findings highlight the practical impact of KD in fair, privacy-conscious FR systems and point to future work on expanding architectures, refining sampling, and exploring the role of training-time complexity versus deployment efficiency.
Abstract
Leveraging the capabilities of Knowledge Distillation (KD) strategies, we devise a strategy to fight the recent retraction of face recognition datasets. Given a pretrained Teacher model trained on a real dataset, we show that carefully utilising synthetic datasets, or a mix between real and synthetic datasets to distil knowledge from this teacher to smaller students can yield surprising results. In this sense, we trained 33 different models with and without KD, on different datasets, with different architectures and losses. And our findings are consistent, using KD leads to performance gains across all ethnicities and decreased bias. In addition, it helps to mitigate the performance gap between real and synthetic datasets. This approach addresses the limitations of synthetic data training, improving both the accuracy and fairness of face recognition models.
