Table of Contents
Fetching ...

Generative Models for Synthetic Urban Mobility Data: A Systematic Literature Review

Alexandra Kapp, Julia Hansmeyer, Helena Mihaljević

TL;DR

This survey addresses the privacy-enabled generation of synthetic urban mobility data, a crucial need given the sensitivity of raw trajectories. It categorizes generative approaches into trips, user movements, and city-population paradigms, contrasting traditional DP/Markov methods with deep learning-based techniques and their hybrids. The study reveals substantial heterogeneity in data sources, evaluation metrics, and privacy guarantees, with many works lacking rigorous privacy assessments or standard benchmarks. It highlights the need for standardized benchmarking, transparent reporting of data properties, and open sharing of code and data, to enable reliable, practice-oriented deployment of synthetic mobility solutions.

Abstract

Although highly valuable for a variety of applications, urban mobility data is rarely made openly available as it contains sensitive personal information. Synthetic data aims to solve this issue by generating artificial data that resembles an original dataset in structural and statistical characteristics, but omits sensitive information. For mobility data, a large number of corresponding models have been proposed in the last decade. This systematic review provides a structured comparative overview of the current state of this heterogeneous, active field of research. A special focus is put on the applicability of the reviewed models in practice.

Generative Models for Synthetic Urban Mobility Data: A Systematic Literature Review

TL;DR

This survey addresses the privacy-enabled generation of synthetic urban mobility data, a crucial need given the sensitivity of raw trajectories. It categorizes generative approaches into trips, user movements, and city-population paradigms, contrasting traditional DP/Markov methods with deep learning-based techniques and their hybrids. The study reveals substantial heterogeneity in data sources, evaluation metrics, and privacy guarantees, with many works lacking rigorous privacy assessments or standard benchmarks. It highlights the need for standardized benchmarking, transparent reporting of data properties, and open sharing of code and data, to enable reliable, practice-oriented deployment of synthetic mobility solutions.

Abstract

Although highly valuable for a variety of applications, urban mobility data is rarely made openly available as it contains sensitive personal information. Synthetic data aims to solve this issue by generating artificial data that resembles an original dataset in structural and statistical characteristics, but omits sensitive information. For mobility data, a large number of corresponding models have been proposed in the last decade. This systematic review provides a structured comparative overview of the current state of this heterogeneous, active field of research. A special focus is put on the applicability of the reviewed models in practice.
Paper Structure (19 sections, 1 equation, 4 figures, 2 tables)

This paper contains 19 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of all steps of the literature search and resulting number of included publications.
  • Figure 2: Categorization of utility evaluation measures in the coded literature.
  • Figure 3: Taxonomy of mobility characteristics and statistical similarity measures.
  • Figure 4: Timeline of all coded publications, displaying the model's name (if existent) or the name of the first author. The size of the bubble indicates how often the publication was cited (based on Google Scholar, accessed 06.03.2023). Models providing privacy guarantees and use of deep learning algorithms are indicated, also, arrows indicate when a previous model was used as a benchmark.