Table of Contents
Fetching ...

Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS)

Alejandro Moreno R., Desale Fentaw, Samuel Palmer, Raúl Salles de Padua, Ninad Dixit, Samuel Mugel, Roman Orús, Manuel Radons, Josef Menter, Ali Abedi

TL;DR

This work introduces a privacy-aware synthetic data generator based on Matrix Product States (MPS) to model mixed-type tabular data. By embedding gradient clipping and calibrated noise into training, it achieves formal privacy guarantees via Rényi DP accounting while maintaining high data fidelity and strong downstream task performance. Across comparisons with CTGAN, VAE, and PrivBayes, the DP-enhanced MPS consistently delivers superior utility under tight privacy budgets, highlighting the framework's potential for secure data sharing. The study demonstrates that tensor-network representations can provide interpretable, scalable, and effective privacy-preserving synthetic data solutions for sensitive domains.

Abstract

Synthetic data generation is a key technique in modern artificial intelligence, addressing data scarcity, privacy constraints, and the need for diverse datasets in training robust models. In this work, we propose a method for generating privacy-preserving high-quality synthetic tabular data using Tensor Networks, specifically Matrix Product States (MPS). We benchmark the MPS-based generative model against state-of-the-art models such as CTGAN, VAE, and PrivBayes, focusing on both fidelity and privacy-preserving capabilities. To ensure differential privacy (DP), we integrate noise injection and gradient clipping during training, enabling privacy guarantees via Rényi Differential Privacy accounting. Across multiple metrics analyzing data fidelity and downstream machine learning task performance, our results show that MPS outperforms classical models, particularly under strict privacy constraints. This work highlights MPS as a promising tool for privacy-aware synthetic data generation. By combining the expressive power of tensor network representations with formal privacy mechanisms, the proposed approach offers an interpretable and scalable alternative for secure data sharing. Its structured design facilitates integration into sensitive domains where both data quality and confidentiality are critical.

Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS)

TL;DR

This work introduces a privacy-aware synthetic data generator based on Matrix Product States (MPS) to model mixed-type tabular data. By embedding gradient clipping and calibrated noise into training, it achieves formal privacy guarantees via Rényi DP accounting while maintaining high data fidelity and strong downstream task performance. Across comparisons with CTGAN, VAE, and PrivBayes, the DP-enhanced MPS consistently delivers superior utility under tight privacy budgets, highlighting the framework's potential for secure data sharing. The study demonstrates that tensor-network representations can provide interpretable, scalable, and effective privacy-preserving synthetic data solutions for sensitive domains.

Abstract

Synthetic data generation is a key technique in modern artificial intelligence, addressing data scarcity, privacy constraints, and the need for diverse datasets in training robust models. In this work, we propose a method for generating privacy-preserving high-quality synthetic tabular data using Tensor Networks, specifically Matrix Product States (MPS). We benchmark the MPS-based generative model against state-of-the-art models such as CTGAN, VAE, and PrivBayes, focusing on both fidelity and privacy-preserving capabilities. To ensure differential privacy (DP), we integrate noise injection and gradient clipping during training, enabling privacy guarantees via Rényi Differential Privacy accounting. Across multiple metrics analyzing data fidelity and downstream machine learning task performance, our results show that MPS outperforms classical models, particularly under strict privacy constraints. This work highlights MPS as a promising tool for privacy-aware synthetic data generation. By combining the expressive power of tensor network representations with formal privacy mechanisms, the proposed approach offers an interpretable and scalable alternative for secure data sharing. Its structured design facilitates integration into sensitive domains where both data quality and confidentiality are critical.

Paper Structure

This paper contains 26 sections, 10 equations, 16 figures, 1 table.

Figures (16)

  • Figure 1: Contraction of tensors, equivalent to matrix multiplication, is graphically represented by connecting the shared indices.
  • Figure 2: Singular Value Decomposition (SVD) breaks a matrix into low-rank tensors for efficient representation, supporting MPS compression.
  • Figure 3: Born Machine with MPS: inputs $x_i$ are mapped to tensor cores $A^{[i]}$; sampling proceeds through sequential contractions.
  • Figure 4: Encoding schemes for feeding tabular data into the MPS architecture.
  • Figure 5: Metric performance for MPS vs other synthetic data models, and the real data.
  • ...and 11 more figures