Table of Contents
Fetching ...

MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data

Eyal German, Daniel Samira, Yuval Elovici, Asaf Shabtai

TL;DR

This paper tackles the privacy risk of diffusion-based synthetic tabular data by introducing MIA-EPT, a black-box membership inference attack that relies on error-based signals from per-column attribute predictors. Operating without access to the target model, it uses shadow diffusion models and auxiliary data to build reconstruction-based features that distinguish training members from non-members. The approach achieves non-trivial leakage across three state-of-the-art tabular diffusion models, with internal AUC-ROC up to 0.599 and TPR@10%FPR up to 22.0%, and secures second place in the MIDST 2025 Black-box Multi-Table track. The results emphasize that synthetic data is not inherently privacy-preserving and motivate defenses such as stronger regularization or differential privacy, as well as extensions to other generative paradigms.

Abstract

Synthetic data generation plays an important role in enabling data sharing, particularly in sensitive domains like healthcare and finance. Recent advances in diffusion models have made it possible to generate realistic, high-quality tabular data, but they may also memorize training records and leak sensitive information. Membership inference attacks (MIAs) exploit this vulnerability by determining whether a record was used in training. While MIAs have been studied in images and text, their use against tabular diffusion models remains underexplored despite the unique risks of structured attributes and limited record diversity. In this paper, we introduce MIAEPT, Membership Inference Attack via Error Prediction for Tabular Data, a novel black-box attack specifically designed to target tabular diffusion models. MIA-EPT constructs errorbased feature vectors by masking and reconstructing attributes of target records, disclosing membership signals based on how well these attributes are predicted. MIA-EPT operates without access to the internal components of the generative model, relying only on its synthetic data output, and was shown to generalize across multiple state-of-the-art diffusion models. We validate MIA-EPT on three diffusion-based synthesizers, achieving AUC-ROC scores of up to 0.599 and TPR@10% FPR values of 22.0% in our internal tests. Under the MIDST 2025 competition conditions, MIA-EPT achieved second place in the Black-box Multi-Table track (TPR@10% FPR = 20.0%). These results demonstrate that our method can uncover substantial membership leakage in synthetic tabular data, challenging the assumption that synthetic data is inherently privacy-preserving. Our code is publicly available at https://github.com/eyalgerman/MIA-EPT.

MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data

TL;DR

This paper tackles the privacy risk of diffusion-based synthetic tabular data by introducing MIA-EPT, a black-box membership inference attack that relies on error-based signals from per-column attribute predictors. Operating without access to the target model, it uses shadow diffusion models and auxiliary data to build reconstruction-based features that distinguish training members from non-members. The approach achieves non-trivial leakage across three state-of-the-art tabular diffusion models, with internal AUC-ROC up to 0.599 and TPR@10%FPR up to 22.0%, and secures second place in the MIDST 2025 Black-box Multi-Table track. The results emphasize that synthetic data is not inherently privacy-preserving and motivate defenses such as stronger regularization or differential privacy, as well as extensions to other generative paradigms.

Abstract

Synthetic data generation plays an important role in enabling data sharing, particularly in sensitive domains like healthcare and finance. Recent advances in diffusion models have made it possible to generate realistic, high-quality tabular data, but they may also memorize training records and leak sensitive information. Membership inference attacks (MIAs) exploit this vulnerability by determining whether a record was used in training. While MIAs have been studied in images and text, their use against tabular diffusion models remains underexplored despite the unique risks of structured attributes and limited record diversity. In this paper, we introduce MIAEPT, Membership Inference Attack via Error Prediction for Tabular Data, a novel black-box attack specifically designed to target tabular diffusion models. MIA-EPT constructs errorbased feature vectors by masking and reconstructing attributes of target records, disclosing membership signals based on how well these attributes are predicted. MIA-EPT operates without access to the internal components of the generative model, relying only on its synthetic data output, and was shown to generalize across multiple state-of-the-art diffusion models. We validate MIA-EPT on three diffusion-based synthesizers, achieving AUC-ROC scores of up to 0.599 and TPR@10% FPR values of 22.0% in our internal tests. Under the MIDST 2025 competition conditions, MIA-EPT achieved second place in the Black-box Multi-Table track (TPR@10% FPR = 20.0%). These results demonstrate that our method can uncover substantial membership leakage in synthetic tabular data, challenging the assumption that synthetic data is inherently privacy-preserving. Our code is publicly available at https://github.com/eyalgerman/MIA-EPT.

Paper Structure

This paper contains 16 sections, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: MIA-EPT pipeline showing the flow through: (1) Shadow Model Training, (2) Attribute Prediction Model Training, (3) Feature Extraction, (4) Attack Classifier Training, and (5) Membership Prediction on the Challenge Dataset.
  • Figure 2: The ROC curves for the multi-table model ClavaDDPM on the test set, comparing MIA-EPT with the recent baseline of Wu et al.wu2025winningmidstchallengenew. The inset zooms in on the low-FPR region.
  • Figure 3: The ROC curves for the single-table models TabDDPM and TabSyn on the test set, comparing MIA-EPT with the recent baseline of Wu et al.wu2025winningmidstchallengenew. The inset zooms in on the low-FPR region.