Table of Contents
Fetching ...

A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport

Yuanyuan Wu, Zhenlin Qin, Zhenliang Ma

TL;DR

This work tackles the challenge of evaluating synthetic trip data for public transport by introducing the Representativeness-Privacy-Utility ($RPU$) framework, which assesses data quality across three dimensions and at three hierarchical levels. It benchmarks twelve generation methods—including statistical models, deep generative models, diffusion models, normalizing flows, LLM-based generators, and privacy-enhanced variants—on real AFC trip data, revealing no one-size-fits-all solution. Key findings show that privacy is not guaranteed by default; CTGAN offers the most balanced trade-off between representativeness, privacy, and utility, while privacy-preserving methods like Priv-BN and PATE-GAN often degrade utility and representativeness. The framework provides a reproducible, multi-dimensional basis for method selection and comparison, guiding practical deployment of synthetic public-transport data for research and policy analysis.

Abstract

Synthetic data offers a promising solution to the privacy and accessibility challenges of using smart card data in public transport research. Despite rapid progress in generative modeling, there is limited attention to comprehensive evaluation, leaving unclear how reliable, safe, and useful synthetic data truly are. Existing evaluations remain fragmented, typically limited to population-level representativeness or record-level privacy, without considering group-level variations or task-specific utility. To address this gap, we propose a Representativeness-Privacy-Utility (RPU) framework that systematically evaluates synthetic trip data across three complementary dimensions and three hierarchical levels (record, group, population). The framework integrates a consistent set of metrics to quantify similarity, disclosure risk, and practical usefulness, enabling transparent and balanced assessment of synthetic data quality. We apply the framework to benchmark twelve representative generation methods, spanning conventional statistical models, deep generative networks, and privacy-enhanced variants. Results show that synthetic data do not inherently guarantee privacy and there is no "one-size-fits-all" model, the trade-off between privacy and representativeness/utility is obvious. Conditional Tabular generative adversarial network (CTGAN) provide the most balanced trade-off and is suggested for practical applications. The RPU framework provides a systematic and reproducible basis for researchers and practitioners to compare synthetic data generation techniques and select appropriate methods in public transport applications.

A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport

TL;DR

This work tackles the challenge of evaluating synthetic trip data for public transport by introducing the Representativeness-Privacy-Utility () framework, which assesses data quality across three dimensions and at three hierarchical levels. It benchmarks twelve generation methods—including statistical models, deep generative models, diffusion models, normalizing flows, LLM-based generators, and privacy-enhanced variants—on real AFC trip data, revealing no one-size-fits-all solution. Key findings show that privacy is not guaranteed by default; CTGAN offers the most balanced trade-off between representativeness, privacy, and utility, while privacy-preserving methods like Priv-BN and PATE-GAN often degrade utility and representativeness. The framework provides a reproducible, multi-dimensional basis for method selection and comparison, guiding practical deployment of synthetic public-transport data for research and policy analysis.

Abstract

Synthetic data offers a promising solution to the privacy and accessibility challenges of using smart card data in public transport research. Despite rapid progress in generative modeling, there is limited attention to comprehensive evaluation, leaving unclear how reliable, safe, and useful synthetic data truly are. Existing evaluations remain fragmented, typically limited to population-level representativeness or record-level privacy, without considering group-level variations or task-specific utility. To address this gap, we propose a Representativeness-Privacy-Utility (RPU) framework that systematically evaluates synthetic trip data across three complementary dimensions and three hierarchical levels (record, group, population). The framework integrates a consistent set of metrics to quantify similarity, disclosure risk, and practical usefulness, enabling transparent and balanced assessment of synthetic data quality. We apply the framework to benchmark twelve representative generation methods, spanning conventional statistical models, deep generative networks, and privacy-enhanced variants. Results show that synthetic data do not inherently guarantee privacy and there is no "one-size-fits-all" model, the trade-off between privacy and representativeness/utility is obvious. Conditional Tabular generative adversarial network (CTGAN) provide the most balanced trade-off and is suggested for practical applications. The RPU framework provides a systematic and reproducible basis for researchers and practitioners to compare synthetic data generation techniques and select appropriate methods in public transport applications.

Paper Structure

This paper contains 24 sections, 23 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: Representativeness, privacy, utility evaluation framework for synthetic trips.
  • Figure 2: Calendar breakdown of selected data period (May 14–27, 2018).
  • Figure 3: The sampled datasets preserve the original distribution of trip counts, start time and end time.
  • Figure 4: Normalized evaluation metrics per method including representativeness ($\mathcal{R}_r,\mathcal{R}_g, \mathcal{R}_p$), privacy ($\mathcal{P}_r,\mathcal{P}_g, \mathcal{P}_p$), and utility ($\mathcal{U}_{cluster},\mathcal{R}_{pred}$).
  • Figure 5: Overall performance per dimension of benchmarking models.
  • ...and 4 more figures