Table of Contents
Fetching ...

TabStruct: Measuring Structural Fidelity of Tabular Data

Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik

TL;DR

A new evaluation metric, $\textbf{global utility}$, is introduced, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures, and demonstrates that global utility provides a task-independent, domain-agnostic lens for tabular generator performance.

Abstract

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a tabular-specific evaluation dimension to assess whether synthetic data complies with the causal structures of real data. However, existing benchmarks often neglect the interplay between structural fidelity and conventional evaluation dimensions, thus failing to provide a holistic understanding of model performance. Moreover, they are typically limited to toy datasets, as quantifying existing structural fidelity metrics requires access to ground-truth causal structures, which are rarely available for real-world datasets. In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. We introduce a new evaluation metric, $\textbf{global utility}$, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures. In addition, we present $\textbf{TabStruct}$, a comprehensive evaluation benchmark offering large-scale quantitative analysis on 13 tabular generators from nine distinct categories, across 29 datasets. Our results demonstrate that global utility provides a task-independent, domain-agnostic lens for tabular generator performance. We release the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results. Code is available at https://github.com/SilenceX12138/TabStruct.

TabStruct: Measuring Structural Fidelity of Tabular Data

TL;DR

A new evaluation metric, , is introduced, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures, and demonstrates that global utility provides a task-independent, domain-agnostic lens for tabular generator performance.

Abstract

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a tabular-specific evaluation dimension to assess whether synthetic data complies with the causal structures of real data. However, existing benchmarks often neglect the interplay between structural fidelity and conventional evaluation dimensions, thus failing to provide a holistic understanding of model performance. Moreover, they are typically limited to toy datasets, as quantifying existing structural fidelity metrics requires access to ground-truth causal structures, which are rarely available for real-world datasets. In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. We introduce a new evaluation metric, , which enables the assessment of structural fidelity even in the absence of ground-truth causal structures. In addition, we present , a comprehensive evaluation benchmark offering large-scale quantitative analysis on 13 tabular generators from nine distinct categories, across 29 datasets. Our results demonstrate that global utility provides a task-independent, domain-agnostic lens for tabular generator performance. We release the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results. Code is available at https://github.com/SilenceX12138/TabStruct.

Paper Structure

This paper contains 52 sections, 4 equations, 9 figures, 66 tables, 2 algorithms.

Figures (9)

  • Figure 1: Illustrative example highlighting the importance of fidelity check for tabular data structure.: A real-world physical system showing the gravitational forces acting on ball A. The system is described by ball density ($\rho$), volume ($V$), masses ($m_{\text{A}}$ & $m_{\text{B}}$), distance ($r$), and gravitational forces ($F_{\text{ball}}$ & $F_{\text{Earth}}$). For simplicity, we assume both balls share identical density. : We derive the ground-truth (GT) causal structure of the system based on Newton's law of universal gravitation. : We interpret the encoded physical laws of the system as the conditional independence (CI) across variables. : We evaluate four generators by conventional metrics. : We assess the structural fidelity by CI tests and the proposed global utility metric. We note that the global structure reflects full conditional independence across all variables, while the local structure includes only those directly relevant to a specific prediction task at hand ($F_{\text{ball}}$). Results demonstrate that conventional metrics are insufficient: for instance, while SMOTE is able to outperform other generators on conventionally used dimensions (e.g., ML efficacy) -- the generated synthetic data only preserves local structure and violates most physical laws. For tabular data, where the truthfulness and authenticity of synthetic data is hard to verify, global utility provides an effective mechanism for evaluating the alignment of the synthetic data to the likely ground-truth causal structure.
  • Figure 2: Overview of the proposed evaluation framework. TabStruct provides a comprehensive evaluation benchmark, including structural fidelity and conventional dimensions, for 13 representative tabular generative models on 29 challenging tabular datasets.
  • Figure 3: Left: Spearman's rank correlation heatmap based on metric values on six SCM datasets. Global utility correlates strongly with global CI, suggesting that global utility can effectively assess global structural fidelity without resorting to SCMs. Right: Mean normalised local utility vs. mean normalised global utility on 23 real-world datasets. SMOTE prioritises local utility, whereas TabDiff and TabSyn generally achieve a balanced preservation of both global and local data structures.
  • Figure 4: Computation efficiency on 23 real-world datasets.Left: Median training time per 1,000 samples vs. mean normalised global utility. Middle: Median generation time per 1,000 samples vs. mean normalised global utility. We exclude the outliers (TabEBM and GReaT) due to their long generation time (over 30s). Right: Median evaluation time. Because global utility yields stable generator rankings across downstream predictors (\ref{['appendix:extended_discussion_practicability']}), computing global utility can be highly efficient with only a small ensemble of predictors (i.e., Tiny-default).
  • Figure 5: Data splitting strategies for benchmarking tabular data generators.
  • ...and 4 more figures