Table of Contents
Fetching ...

TVineSynth: A Truncated C-Vine Copula Generator of Synthetic Tabular Data to Balance Privacy and Utility

Elisabeth Griesbauer, Claudia Czado, Arnoldo Frigessi, Ingrid Hobæk Haff

TL;DR

TVineSynth introduces a truncated C-vine copula approach to synthetic tabular data that balances privacy and utility without relying on differential privacy. By reordering features to create block-structured dependencies and truncating the vine, it selectively suppresses privacy-leaking relationships while preserving those essential for predictive tasks. The framework provides a theoretical basis for AIA privacy via MAB and shows robust privacy against MIA, supported by empirical results on simulated and real data, including a real-world SUPPORT2 study. Overall, TVineSynth achieves a favorable privacy-utility balance relative to DP and non-DP competitors, with practical implications for privacy-conscious data sharing in sensitive domains.

Abstract

We propose TVineSynth, a vine copula based synthetic tabular data generator, which is designed to balance privacy and utility, using the vine tree structure and its truncation to do the trade-off. Contrary to synthetic data generators that achieve DP by globally adding noise, TVineSynth performs a controlled approximation of the estimated data generating distribution, so that it does not suffer from poor utility of the resulting synthetic data for downstream prediction tasks. TVineSynth introduces a targeted bias into the vine copula model that, combined with the specific tree structure of the vine, causes the model to zero out privacy-leaking dependencies while relying on those that are beneficial for utility. Privacy is here measured with membership (MIA) and attribute inference attacks (AIA). Further, we theoretically justify how the construction of TVineSynth ensures AIA privacy under a natural privacy measure for continuous sensitive attributes. When compared to competitor models, with and without DP, on simulated and on real-world data, TVineSynth achieves a superior privacy-utility balance.

TVineSynth: A Truncated C-Vine Copula Generator of Synthetic Tabular Data to Balance Privacy and Utility

TL;DR

TVineSynth introduces a truncated C-vine copula approach to synthetic tabular data that balances privacy and utility without relying on differential privacy. By reordering features to create block-structured dependencies and truncating the vine, it selectively suppresses privacy-leaking relationships while preserving those essential for predictive tasks. The framework provides a theoretical basis for AIA privacy via MAB and shows robust privacy against MIA, supported by empirical results on simulated and real data, including a real-world SUPPORT2 study. Overall, TVineSynth achieves a favorable privacy-utility balance relative to DP and non-DP competitors, with practical implications for privacy-conscious data sharing in sensitive domains.

Abstract

We propose TVineSynth, a vine copula based synthetic tabular data generator, which is designed to balance privacy and utility, using the vine tree structure and its truncation to do the trade-off. Contrary to synthetic data generators that achieve DP by globally adding noise, TVineSynth performs a controlled approximation of the estimated data generating distribution, so that it does not suffer from poor utility of the resulting synthetic data for downstream prediction tasks. TVineSynth introduces a targeted bias into the vine copula model that, combined with the specific tree structure of the vine, causes the model to zero out privacy-leaking dependencies while relying on those that are beneficial for utility. Privacy is here measured with membership (MIA) and attribute inference attacks (AIA). Further, we theoretically justify how the construction of TVineSynth ensures AIA privacy under a natural privacy measure for continuous sensitive attributes. When compared to competitor models, with and without DP, on simulated and on real-world data, TVineSynth achieves a superior privacy-utility balance.

Paper Structure

This paper contains 77 sections, 6 theorems, 107 equations, 31 figures, 5 tables, 2 algorithms.

Key Result

Theorem 2.3

Under these assumptions, it holds for large enough $n$ that: with: and $\bm{J}^{1\ldots \tau, 1 \ldots \tau}$ is the upper left sub-matrix of $\bm{J}^{-1}$ corresponding to the parameters $(\pi_{Y},\bm{\theta}_{1},\ldots,\bm{\theta}_{\tau})$.

Figures (31)

  • Figure 1: Privacy-utility plot of synthetic data generated with a C-vine truncated at $t \in \{1,11,12,18\}$ and no truncation (a) and competitors (b) from simulated real data. For AIA privacy, the MAB and for utility the median over 50 synthetic data sets are reported. Parameters of the generative models and privacy attacks can be found in Appendix \ref{['sec:model_and_attack_parameters']}.
  • Figure 2: MAB (a), PG (b), utility (c) and privacy-utility plots (d) of synthetic data generated with a C-vine for different truncation levels, CTGAN, TVAE, PrivPGD ($\epsilon=2.5, \, \delta=10^{-5}$) and PrivBayes ($\epsilon \in \{0.1, 1, 5\}$). Boxplots are obtained from 10 game iterations in the AIA and MIA, 50 synthetic data sets in the utility evaluation. Model and privacy attack parameters can be found in Appendix \ref{['sec:model_and_attack_parameters']}.
  • Figure 3: A C-vine on 5 elements.
  • Figure 4: A D-vine on 4 elements.
  • Figure 5: A D-vine on 4 elements with the notation of Definition \ref{['def:complete_union']}.
  • ...and 26 more figures

Theorems & Definitions (26)

  • Definition 2.1: Truncation of the Vine Copula at Level $t$
  • Definition 2.2: Mean Absolute $\beta$-Coefficient, MAB
  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Definition A.1
  • Theorem A.2: Sklar's Theorem
  • Definition A.3: bivariate Gauss copula
  • Definition A.4
  • Remark A.5
  • ...and 16 more