Table of Contents
Fetching ...

Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics

Jun'ichi Ozaki, Ryosuke Susuta, Takuhiro Moriyama, Yohei Shida

TL;DR

The paper tackles privacy concerns in sharing high-resolution mobility data by proposing a privacy-preserving synthetic data framework that reconstructs full-day trajectories from aggregated origin–destination inputs. It extends conventional synthetic data generation by jointly aligning three behavioral statistics: OD matrices, daily visit-frequency distributions following a universal law, and dwell–travel-time distributions derived from quantile statistics, using a multi-objective simulated-annealing optimization. Validated in Tokyo and Fukuoka, the approach achieves OD fidelity within 10% while substantially improving the reproduction of visit-frequency and dwell–travel times, demonstrating a production-ready pathway for scalable, privacy-safe city-scale analytics. The framework also supports integration of external aggregated statistics, enabling tailored analyses for policy and industry use without exposing individuals.

Abstract

Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.

Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics

TL;DR

The paper tackles privacy concerns in sharing high-resolution mobility data by proposing a privacy-preserving synthetic data framework that reconstructs full-day trajectories from aggregated origin–destination inputs. It extends conventional synthetic data generation by jointly aligning three behavioral statistics: OD matrices, daily visit-frequency distributions following a universal law, and dwell–travel-time distributions derived from quantile statistics, using a multi-objective simulated-annealing optimization. Validated in Tokyo and Fukuoka, the approach achieves OD fidelity within 10% while substantially improving the reproduction of visit-frequency and dwell–travel times, demonstrating a production-ready pathway for scalable, privacy-safe city-scale analytics. The framework also supports integration of external aggregated statistics, enabling tailored analyses for policy and industry use without exposing individuals.

Abstract

Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.

Paper Structure

This paper contains 13 sections, 10 equations, 4 figures.

Figures (4)

  • Figure 1: Example of a daily trajectory from the GAD. A virtual user record with assigned age and sex attributes. All movements were by car, with routes allocated using MATSim software on actual road networks. Pictograms denote activity purposes (home, work, eating, and others), and travel segments are color-coded according to the time of day.
  • Figure 2: Optimization procedure. Normalized loss functions $(L_\mathrm{OD}/L_\mathrm{OD}(0), L_\mathrm{VF}/L_\mathrm{VF}(0), L_\mathrm{DT}/L_\mathrm{DT}(0), L_\mathrm{tot})$ are shown for male agents in their twenties, simulated at the parameter settings $(w_\mathrm{OD}, w_\mathrm{VF}, w_\mathrm{DT}) = (1, 0.01, 0.02)$. The simulation step $\tau$ was normalized to $[0,1]$. The SA process began at the maximum temperature ($\tau = 0$), decreased until $\tau = 0.5$, and repeated this schedule once, ending at $\tau = 1$. The iteration boundary at $\tau = 0.5$ marked the reset of the temperature to its initial value.
  • Figure 3: Loss functions $L^\mathrm{eval}_\mathrm{OD}$, $L_\mathrm{VF}$, and $L_\mathrm{DT}$ after optimization for the 23 special wards of Tokyo. Each loss function was averaged over all attributes (sex and age groups). The horizontal and vertical axes represent $w_\mathrm{VF}$ and $w_\mathrm{DT}$, respectively. Black crosses indicate the simulated parameter combinations within the grid search range. All three plots demonstrated that the corresponding loss decreased as its associated weight increased.
  • Figure 4: Loss functions $L^\mathrm{eval}_\mathrm{OD}$, $L_\mathrm{VF}$, and $L_\mathrm{DT}$ after optimization for Fukuoka Prefecture. Each loss function was averaged over all attributes (sex and age groups). The horizontal and vertical axes represent $w_\mathrm{VF}$ and $w_\mathrm{DT}$, respectively. Black crosses indicate the simulated parameter combinations within the grid search range. All three plots demonstrated that the corresponding loss decreased as its associated weight increased.