Privacy Vulnerabilities in Marginals-based Synthetic Data
Steven Golob, Sikha Pentyala, Anuar Maratkhan, Martine De Cock
TL;DR
This paper addresses privacy vulnerabilities in marginals-based synthetic data under differential privacy by introducing MAMA-MIA, a lightweight membership inference attack that leverages auxiliary data and black-box knowledge of the SDG to build a density estimator $\zeta$ of the training data. By identifying focal-points through shadow modelling and aggregating them into $\zeta$, the method achieves competitive or superior inference accuracy compared to state-of-the-art MIAs while dramatically reducing computational requirements. The authors demonstrate effectiveness across representative marginals-based SDGs (MST, PrivBayes, Private-GSD) on the SNAKE and California Housing datasets, and discuss broader implications for privacy policy and SDG design. The work reveals that even DP-protected, marginals-preserving generators can leak individual information, motivating new defenses and auditing approaches for synthetic data systems.
Abstract
When acting as a privacy-enhancing technology, synthetic data generation (SDG) aims to maintain a resemblance to the real data while excluding personally-identifiable information. Many SDG algorithms provide robust differential privacy (DP) guarantees to this end. However, we show that the strongest class of SDG algorithms--those that preserve \textit{marginal probabilities}, or similar statistics, from the underlying data--leak information about individuals that can be recovered more efficiently than previously understood. We demonstrate this by presenting a novel membership inference attack, MAMA-MIA, and evaluate it against three seminal DP SDG algorithms: MST, PrivBayes, and Private-GSD. MAMA-MIA leverages knowledge of which SDG algorithm was used, allowing it to learn information about the hidden data more accurately, and orders-of-magnitude faster, than other leading attacks. We use MAMA-MIA to lend insight into existing SDG vulnerabilities. Our approach went on to win the first SNAKE (SaNitization Algorithm under attacK ... $\varepsilon$) competition.
