Table of Contents
Fetching ...

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

Joshua Ward, Chi-Hua Wang, Guang Cheng

TL;DR

Relational data release introduces privacy risks that extend beyond single-table records. The authors propose MT-MIA, a multi-table membership inference attack that uses heterogeneous graph neural networks to learn embeddings of user-centric subgraphs under a No-Box threat model, enabling user-level privacy auditing of synthetic relational data. They demonstrate that item-level MIA approaches fail to detect leakage in relational settings, while MT-MIA achieves near-perfect discrimination in motivating cases and outperforms baselines on real datasets, revealing leakage pathways linked to relational structure. The work highlights a tradeoff between data fidelity and privacy, provides a practical auditing framework, and suggests future directions for defense and broader auditing tasks in relational data contexts.

Abstract

Synthetic tabular data has gained attention for enabling privacy-preserving data sharing. While substantial progress has been made in single-table synthetic generation where data are modeled at the row or item level, most real-world data exists in relational databases where a user's information spans items across multiple interconnected tables. Recent advances in synthetic relational data generation have emerged to address this complexity, yet release of these data introduce unique privacy challenges as information can be leaked not only from individual items but also through the relationships that comprise a complete user entity. To address this, we propose a novel Membership Inference Attack (MIA) setting to audit the empirical user-level privacy of synthetic relational data and show that single-table MIAs that audit at an item level underestimate user-level privacy leakage. We then propose Multi-Table Membership Inference Attack (MT-MIA), a novel adversarial attack under a No-Box threat model that targets learned representations of user entities via Heterogeneous Graph Neural Networks. By incorporating all connected items for a user, MT-MIA better targets user-level vulnerabilities induced by inter-tabular relationships than existing attacks. We evaluate MT-MIA on a range of real-world multi-table datasets and demonstrate that this vulnerability exists in state-of-the-art relational synthetic data generators, employing MT-MIA to additionally study where this leakage occurs.

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

TL;DR

Relational data release introduces privacy risks that extend beyond single-table records. The authors propose MT-MIA, a multi-table membership inference attack that uses heterogeneous graph neural networks to learn embeddings of user-centric subgraphs under a No-Box threat model, enabling user-level privacy auditing of synthetic relational data. They demonstrate that item-level MIA approaches fail to detect leakage in relational settings, while MT-MIA achieves near-perfect discrimination in motivating cases and outperforms baselines on real datasets, revealing leakage pathways linked to relational structure. The work highlights a tradeoff between data fidelity and privacy, provides a practical auditing framework, and suggests future directions for defense and broader auditing tasks in relational data contexts.

Abstract

Synthetic tabular data has gained attention for enabling privacy-preserving data sharing. While substantial progress has been made in single-table synthetic generation where data are modeled at the row or item level, most real-world data exists in relational databases where a user's information spans items across multiple interconnected tables. Recent advances in synthetic relational data generation have emerged to address this complexity, yet release of these data introduce unique privacy challenges as information can be leaked not only from individual items but also through the relationships that comprise a complete user entity. To address this, we propose a novel Membership Inference Attack (MIA) setting to audit the empirical user-level privacy of synthetic relational data and show that single-table MIAs that audit at an item level underestimate user-level privacy leakage. We then propose Multi-Table Membership Inference Attack (MT-MIA), a novel adversarial attack under a No-Box threat model that targets learned representations of user entities via Heterogeneous Graph Neural Networks. By incorporating all connected items for a user, MT-MIA better targets user-level vulnerabilities induced by inter-tabular relationships than existing attacks. We evaluate MT-MIA on a range of real-world multi-table datasets and demonstrate that this vulnerability exists in state-of-the-art relational synthetic data generators, employing MT-MIA to additionally study where this leakage occurs.
Paper Structure (41 sections, 1 theorem, 6 equations, 4 figures, 4 tables)

This paper contains 41 sections, 1 theorem, 6 equations, 4 figures, 4 tables.

Key Result

Theorem 1

Let $G_{\text{test}} = (V, E)$ be a graph that is the disjoint union of two subgraphs $G_{\text{train}}$ and $G_{\text{holdout}}$, where $V(G) = V(G_{\text{train}}) \cup V(G_{\text{holdout}})$ and $V(G_{\text{train}}) \cap V(G_{\text{holdout}}) = \emptyset$. Furthermore, there are no edges in $G$ co

Figures (4)

  • Figure 1: Comparison of attack strategies for Section \ref{['subsec:motivexample']}. All methods operate on the same Customer-Transaction graph ($C_1$ with three transactions $T_1, T_2, T_3$). Blue highlighting indicates information used by each attack; gray indicates ignored or collapsed information. Single-table based attacks have three options in constructing their attack: (a) Use only customer features, ignoring all relational information. (b) Join each customer to a single transaction, requiring arbitrary sampling and discarding remaining relationships. (c) Joining customers to arbitrarily aggregated transaction features, collapsing the relationship structure. (d) MT-MIA preserves and learns from the complete graph structure, automatically discovering that relationship cardinality reveals membership (AUC = 0.999).
  • Figure 2: Plot of the True Positive Rate by log-scaled False Positive Rate for Section \ref{['subsec:motivexample']}. Conventional single-table attacks Distance to Closest Record ganleaks and MC Hilprecht2019MonteCA cannot distinguish membership from the Customers table with or without additional Transaction rows joined as they cannot exploit the given inter-tabular leakage. MT-MIA learns and attacks a representation of the entire user subgraph allowing for nearly perfectly membership discrimination (AUC=0.999).
  • Figure 3: Inference Time Architecture Diagram of MT-MIA.
  • Figure 4: True Positive Rate by Log Scaled False Positive Rate for the most successful MT-MIA runs on ClavaDDPM. We plot the success of DCR utilizing the intermediate embeddings $z_{parent}$ and $z_{context}$ as well as the final embedding $z_{final}$. While $z_{final}$ yields a competitive attack on high fidelity multi-table data, different datasets exhibit different sources of more severe privacy leakage that are capture in these intermediate latent spaces.

Theorems & Definitions (1)

  • Theorem 1