Table of Contents
Fetching ...

TabularQGAN: A Quantum Generative Model for Tabular Data

Pallavi Bhardwaj, Caitlin Jones, Lasse Dierich, Aleksandar Vučković

TL;DR

TabularQGAN addresses privacy-preserving synthesis of heterogeneous tabular data by introducing a quantum GAN whose generator is a variational quantum circuit and whose encoder natively handles numerical and categorical features without autoencoding. It employs one-hot Givens-rotation encoding and a linear-scaling circuit with inter-register entanglers to capture cross-feature correlations, trained adversarially against a classical discriminator with parameter-shift gradient updates. Across MIMIC-III and Adult Census datasets, TabularQGAN achieves higher SDMetrics similarity than classical baselines while using orders of magnitude fewer parameters, and shows signs of generalization via novel-sample metrics. The work demonstrates the practical viability of quantum generative models for tabular data and motivates scaling studies toward larger qubit counts and real hardware implementations.

Abstract

In this paper, we introduce a novel quantum generative model for synthesizing tabular data. Synthetic data is valuable in scenarios where real-world data is scarce or private, it can be used to augment or replace existing datasets. Real-world enterprise data is predominantly tabular and heterogeneous, often comprising a mixture of categorical and numerical features, making it highly relevant across various industries such as healthcare, finance, and software. We propose a quantum generative adversarial network architecture with flexible data encoding and a novel quantum circuit ansatz to effectively model tabular data. The proposed approach is tested on the MIMIC III healthcare and Adult Census datasets, with extensive benchmarking against leading classical models, CTGAN, and CopulaGAN. Experimental results demonstrate that our quantum model outperforms classical models by an average of 8.5% with respect to an overall similarity score from SDMetrics, while using only 0.072% of the parameters of the classical models. Additionally, we evaluate the generalization capabilities of the models using two custom-designed metrics that demonstrate the ability of the proposed quantum model to generate useful and novel samples. To our knowledge, this is one of the first demonstrations of a successful quantum generative model for handling tabular data, indicating that this task could be well-suited to quantum computers.

TabularQGAN: A Quantum Generative Model for Tabular Data

TL;DR

TabularQGAN addresses privacy-preserving synthesis of heterogeneous tabular data by introducing a quantum GAN whose generator is a variational quantum circuit and whose encoder natively handles numerical and categorical features without autoencoding. It employs one-hot Givens-rotation encoding and a linear-scaling circuit with inter-register entanglers to capture cross-feature correlations, trained adversarially against a classical discriminator with parameter-shift gradient updates. Across MIMIC-III and Adult Census datasets, TabularQGAN achieves higher SDMetrics similarity than classical baselines while using orders of magnitude fewer parameters, and shows signs of generalization via novel-sample metrics. The work demonstrates the practical viability of quantum generative models for tabular data and motivates scaling studies toward larger qubit counts and real hardware implementations.

Abstract

In this paper, we introduce a novel quantum generative model for synthesizing tabular data. Synthetic data is valuable in scenarios where real-world data is scarce or private, it can be used to augment or replace existing datasets. Real-world enterprise data is predominantly tabular and heterogeneous, often comprising a mixture of categorical and numerical features, making it highly relevant across various industries such as healthcare, finance, and software. We propose a quantum generative adversarial network architecture with flexible data encoding and a novel quantum circuit ansatz to effectively model tabular data. The proposed approach is tested on the MIMIC III healthcare and Adult Census datasets, with extensive benchmarking against leading classical models, CTGAN, and CopulaGAN. Experimental results demonstrate that our quantum model outperforms classical models by an average of 8.5% with respect to an overall similarity score from SDMetrics, while using only 0.072% of the parameters of the classical models. Additionally, we evaluate the generalization capabilities of the models using two custom-designed metrics that demonstrate the ability of the proposed quantum model to generate useful and novel samples. To our knowledge, this is one of the first demonstrations of a successful quantum generative model for handling tabular data, indicating that this task could be well-suited to quantum computers.

Paper Structure

This paper contains 22 sections, 23 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 2: Schematic diagram of TabularQGAN training. In Step 1, either a batch of training data or a batch of synthetic samples (obtained from single-shot measurements) is fed to the discriminator. In Step 2, the discriminator attempts to distinguish between real and fake samples, and its parameters $\phi$ are updated based on the gradient of the discriminator loss $L_{D}$. In Step 3, a sample is generated for each parameter shift, and the discriminator with fixed parameters $\phi$ is used to compute the gradient of the parameters according to the parameter-shift rule. In Step 4, the generator parameters $\theta$ are updated based on their gradient.
  • Figure 3: Plot of the overall metric for each hyperparameter configuration for each dataset. The spread of the points within each bar is artificially added to improve data visibility. It can be seen that the TabularQGAN model consistently outperforms the other models.
  • Figure 4: Comparison of the overall metric for each hyperparameter configuration on the Adults Census 10 and Adults Census 15 datasets, using the Unique-Row-Index encoding and a single numerical register for index generation. The spread of points within each bar has been added to improve data visibility. It is evident that the performance of the TabularQGAN model is significantly lower when a Unique-Row-Index encoding is used instead of the proposed one-hot encoding.
  • Figure 5: Plot showing the distribution of the overall metric value for each data set with the two different encodings. Adult Census 15 is excluded as it does not contain any binary features.
  • Figure 6: Plot showing the overlap fraction metric for different models and data types. Only a selected subset of the data is sampled.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1: Hilbert Space Reduction via Givens Rotations