Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework
Andrey Sidorenko, Michael Platzer, Mario Scriminaci, Paul Tiwald
TL;DR
The paper addresses the challenge of evaluating synthetic tabular data by balancing fidelity and privacy. It introduces a holdout-based framework with three metric families—Accuracy, Centroid Similarity, and Distances—that jointly assess low- and high-dimensional fidelity, embedding-based distributional similarity, and novelty. By handling mixed-type and sequential/contextual data, and providing open-source tooling (mostlyai-qa), the approach enables reproducible benchmarking and interpretable quality diagnostics. The framework supports comparisons across synthesizers and clarifies the trade-offs between utility and privacy, with practical impact for researchers and practitioners seeking standardized evaluation of synthetic data pipelines, under a north-star reference of $(1,1)$ on holdout-based benchmarks.
Abstract
Evaluating the quality of synthetic data remains a key challenge for ensuring privacy and utility in data-driven research. In this work, we present an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy. The proposed approach employs a holdout-based benchmarking strategy that facilitates quantitative assessment through low- and high-dimensional distribution comparisons, embedding-based similarity measures, and nearest-neighbor distance metrics. The framework supports various data types and structures, including sequential and contextual information, and enables interpretable quality diagnostics through a set of standardized metrics. These contributions aim to support reproducibility and methodological consistency in benchmarking of synthetic data generation techniques. The code of the framework is available at https://github.com/mostly-ai/mostlyai-qa.
