Winning the MIDST Challenge: New Membership Inference Attacks on Diffusion Models for Tabular Data Synthesis
Xiaoyu Wu, Yifei Pang, Terrance Liu, Steven Wu
TL;DR
This paper tackles privacy risks in diffusion-based tabular data synthesis by applying rigorous membership inference attacks (MIA). It shows that strong image-domain MIAs, such as SecMI, underperform on tabular diffusion models, and identifies noise initialization as a key factor driving attack variance. To address this, the authors propose a machine-learning-driven MIA that uses loss features computed across multiple noises $\varepsilon$ and time steps $t$, processed by a lightweight three-layer MLP to predict membership without manual optimization. Experiments on the MIDST Challenge @ SaTML 2025 demonstrate first place across all tracks, underscoring the need for stronger privacy evaluations in synthetic tabular data generation and providing public code at the linked repository.
Abstract
Tabular data synthesis using diffusion models has gained significant attention for its potential to balance data utility and privacy. However, existing privacy evaluations often rely on heuristic metrics or weak membership inference attacks (MIA), leaving privacy risks inadequately assessed. In this work, we conduct a rigorous MIA study on diffusion-based tabular synthesis, revealing that state-of-the-art attacks designed for image models fail in this setting. We identify noise initialization as a key factor influencing attack efficacy and propose a machine-learning-driven approach that leverages loss features across different noises and time steps. Our method, implemented with a lightweight MLP, effectively learns membership signals, eliminating the need for manual optimization. Experimental results from the MIDST Challenge @ SaTML 2025 demonstrate the effectiveness of our approach, securing first place across all tracks. Code is available at https://github.com/Nicholas0228/Tartan_Federer_MIDST.
