Table of Contents
Fetching ...

How Well Does Your Tabular Generator Learn the Structure of Tabular Data?

Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik

TL;DR

TabStruct addresses how to evaluate tabular data generators by prioritizing the preservation of causal structure. It introduces a CPDAG-level structural fidelity benchmark that compares real and synthetic data via conditional independence relations, grounded in expert-validated SCMs across diverse datasets. The study demonstrates that conventional metrics like density estimation and downstream utility often fail to reflect structural fidelity, and that many generators struggle to recover underlying tabular structures, with baselines occasionally outperforming complex models. The open-source TabStruct framework and datasets offer a practical, structure-aware benchmark to drive the development of more robust tabular data generators.

Abstract

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to adapt the successes of generative modelling in homogeneous modalities to the tabular domain, defining an effective generator for tabular data remains an open problem. One major reason is that the evaluation criteria inherited from other modalities often fail to adequately assess whether tabular generative models effectively capture or utilise the unique structural information encoded in tabular data. In this paper, we carefully examine the limitations of the prevailing evaluation framework and introduce $\textbf{TabStruct}$, a novel evaluation benchmark that positions structural fidelity as a core evaluation dimension. Specifically, TabStruct evaluates the alignment of causal structures in real and synthetic data, providing a direct measure of how effectively tabular generative models learn the structure of tabular data. Through extensive experiments using generators from eight categories on seven datasets with expert-validated causal graphical structures, we show that structural fidelity offers a task-independent, domain-agnostic evaluation dimension. Our findings highlight the importance of tabular data structure and offer practical guidance for developing more effective and robust tabular generative models. Code is available at https://github.com/SilenceX12138/TabStruct.

How Well Does Your Tabular Generator Learn the Structure of Tabular Data?

TL;DR

TabStruct addresses how to evaluate tabular data generators by prioritizing the preservation of causal structure. It introduces a CPDAG-level structural fidelity benchmark that compares real and synthetic data via conditional independence relations, grounded in expert-validated SCMs across diverse datasets. The study demonstrates that conventional metrics like density estimation and downstream utility often fail to reflect structural fidelity, and that many generators struggle to recover underlying tabular structures, with baselines occasionally outperforming complex models. The open-source TabStruct framework and datasets offer a practical, structure-aware benchmark to drive the development of more robust tabular data generators.

Abstract

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to adapt the successes of generative modelling in homogeneous modalities to the tabular domain, defining an effective generator for tabular data remains an open problem. One major reason is that the evaluation criteria inherited from other modalities often fail to adequately assess whether tabular generative models effectively capture or utilise the unique structural information encoded in tabular data. In this paper, we carefully examine the limitations of the prevailing evaluation framework and introduce , a novel evaluation benchmark that positions structural fidelity as a core evaluation dimension. Specifically, TabStruct evaluates the alignment of causal structures in real and synthetic data, providing a direct measure of how effectively tabular generative models learn the structure of tabular data. Through extensive experiments using generators from eight categories on seven datasets with expert-validated causal graphical structures, we show that structural fidelity offers a task-independent, domain-agnostic evaluation dimension. Our findings highlight the importance of tabular data structure and offer practical guidance for developing more effective and robust tabular generative models. Code is available at https://github.com/SilenceX12138/TabStruct.

Paper Structure

This paper contains 31 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The overview of the TabStruct evaluation framework.(A) Given the graphical structures (i.e., structural causal models) validated by domain experts, we perform prior sampling on these graphs to generate a full dataset $\mathcal{D}$. (B) We train tabular generative models on the training split $\mathcal{D}_{\text{ref}} \subset \mathcal{D}$. We then generate synthetic data $\mathcal{D}_{\text{syn}}$ with the fitted models. (C) We evaluate the quality of synthetic data by comparing $\mathcal{D}_{\text{ref}}$ and $\mathcal{D}_{\text{syn}}$ across four dimensions.
  • Figure 2: An illustrative example for the quantification of structural fidelity. Given the ground-truth causal structure, we first derive the conditional independence relationships between features. These relationships are then divided into two levels of granularity: global and local. The global set encompasses all conditional independence relationships across the entire feature set, whereas the local set includes only those relationships that are directly relevant to the target variable $y$. Next, we apply conditional independence tests on $\mathcal{D}_{\text{syn}}$ to examine the alignment of conditional independence relationships between features.
  • Figure 3: Summarised comparison of nine tabular data generators across four evaluation dimensions. The results reveal that excelling in conventional evaluation dimensions does not ensure the model's ability to capture the underlying data structure. Learning the underlying data structure remains challenging for tabular generative modelling.
  • Figure 4: Data splitting strategies for benchmarking tabular data generators.
  • Figure 5: Downstream utility vs. different ratios between the number of synthetic data and reference data ($N_{\text{syn}}:N_{\text{ref}}$). On the "Hailfinder" dataset, as $N_{\text{syn}}$ increases, the evaluation results become saturated. Specifically, the range of balanced accuracy varies by less than 0.3% when the ratio increases from $N_{\text{syn}}:N_{\text{ref}}=3:1$ to $N_{\text{syn}}:N_{\text{ref}}=10:1$. Therefore, we set $N_{\text{syn}}=3N_{\text{ref}}$ in all experiments to ensure stable evaluation results.