PFGuard: A Generative Framework with Privacy and Fairness Safeguards
Soyeon Kim, Yuji Roh, Geon Heo, Steven Euijong Whang
TL;DR
PFGuard tackles the hard problem of generating private and fair synthetic data by explicitly addressing the counteractive relationship between differential privacy and fairness in high-dimensional settings. It introduces a two-stage design: fair training of an ensemble of intermediate teacher models via balanced minibatch sampling, followed by private training that transfers knowledge to a DP generator using Private Teacher Ensemble Learning (PTEL) with GNMax-like aggregation. The framework provides formal DP guarantees for the generator while improving fairness across groups and maintaining competitive utility, demonstrated through extensive experiments on MNIST, FashionMNIST, and CelebA. The results show that naive combinations of privacy and fairness techniques can degrade performance, whereas PFGuard achieves a better privacy-fairness-utility tradeoff with minimal overhead and compatibility with existing private generative models.
Abstract
Generative models must ensure both privacy and fairness for Trustworthy AI. While these goals have been pursued separately, recent studies propose to combine existing privacy and fairness techniques to achieve both goals. However, naively combining these techniques can be insufficient due to privacy-fairness conflicts, where a sample in a minority group may be represented in ways that support fairness, only to be suppressed for privacy. We demonstrate how these conflicts lead to adverse effects, such as privacy violations and unexpected fairness-utility tradeoffs. To mitigate these risks, we propose PFGuard, a generative framework with privacy and fairness safeguards, which simultaneously addresses privacy, fairness, and utility. By using an ensemble of multiple teacher models, PFGuard balances privacy-fairness conflicts between fair and private training stages and achieves high utility based on ensemble learning. Extensive experiments show that PFGuard successfully generates synthetic data on high-dimensional data while providing both DP guarantees and convergence in fair generative modeling.
