SAFES: Sequential Privacy and Fairness Enhancing Data Synthesis for Responsible AI
Spencer Giddens, Xiaon Lang, Fang Liu
TL;DR
SAFES tackles the joint problem of privacy and fairness in synthetic data by sequentially applying DP data synthesis followed by fairness-aware preprocessing, yielding a general, modular framework compatible with multiple DP synthesizers (e.g., AIM, DP-CTGAN) and fairness preprocessors (e.g., TOT, RW). The approach demonstrates that, under reasonable privacy budgets, SAFES can provide improved fairness with limited utility loss, offering a practical pathway to responsible AI data releases. The work also discusses the inherent trade-offs among privacy, fairness, and utility, shows robustness under varied settings, and highlights scalability constraints that motivate future work on efficiency and broader fairness definitions. Overall, SAFES contributes a principled, flexible toolkit for producing privacy-preserving, fairness-aware synthetic data for downstream ML tasks with real-world impact in domains like lending and criminal justice.
Abstract
As data-driven and AI-based decision making gains widespread adoption across disciplines, it is crucial that both data privacy and decision fairness are appropriately addressed. Although differential privacy (DP) provides a robust framework for guaranteeing privacy and methods are available to improve fairness, most prior work treats the two concerns separately. Even though there are existing approaches that consider privacy and fairness simultaneously, they typically focus on a single specific learning task, limiting their generalizability. In response, we introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data preprocessing step. SAFES allows users flexibility in navigating the privacy-fairness-utility trade-offs. We illustrate SAFES with different DP synthesizers and fairness-aware data preprocessing methods and run extensive experiments on multiple real datasets to examine the privacy-fairness-utility trade-offs of synthetic data generated by SAFES. Empirical evaluations demonstrate that for reasonable privacy loss, SAFES-generated synthetic data can achieve significantly improved fairness metrics with relatively low utility loss.
