High Epsilon Synthetic Data Vulnerabilities in MST and PrivBayes
Steven Golob, Sikha Pentyala, Anuar Maratkhan, Martine De Cock
TL;DR
The paper demonstrates that high differential privacy budgets ($\varepsilon$) can enable unambiguous membership inference attacks on state-of-the-art DP-SDGs MST and PrivBayes. It extends the DOMIAS framework with a black-box attack and auxiliary-data assumptions, leveraging shadow modelling to identify focal-points and construct a problem-specific density estimator $S$, enabling accurate inference on synthetic data membership. Experimental results show increasing attack efficacy with $\varepsilon$, achieving high membership-advantage scores especially for PrivBayes at $\varepsilon=1000$, highlighting practical privacy risks. The findings motivate stronger defenses for DP-SDGs and call for careful consideration of privacy-utility trade-offs in real-world deployments.
Abstract
Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed in recent years to improve and balance both of these aims. Many of these algorithms provide robust differential privacy guarantees. However, we show here that if the differential privacy parameter $\varepsilon$ is set too high, then unambiguous privacy leakage can result. We show this by conducting a novel membership inference attack (MIA) on two state-of-the-art differentially private SDG algorithms: MST and PrivBayes. Our work suggests that there are vulnerabilities in these generators not previously seen, and that future work to strengthen their privacy is advisable. We present the heuristic for our MIA here. It assumes knowledge of auxiliary "population" data, and also assumes knowledge of which SDG algorithm was used. We use this information to adapt the recent DOMIAS MIA uniquely to MST and PrivBayes. Our approach went on to win the SNAKE challenge in November 2023.
