Table of Contents
Fetching ...

Garden city: A synthetic dataset and sandbox environment for analysis of pre-processing algorithms for GPS human mobility data

Thomas H. Li, Francisco Barreras

TL;DR

Garden City provides a synthetic, ground-truth-friendly sandbox to evaluate GPS pre-processing algorithms under realistic sparsity and noise. It combines a generated city, an EPR-based mobility diary with circadian constraints, a micro-macro trajectory generator, and a self-exciting sampling process to produce sparse, clustered pings with measurement noise. The paper demonstrates robustness testing for stop-detection and noise sensitivity, illustrating how burstiness and context affect algorithm outputs. By offering open-source code, tutorials, and data, it enables rigorous calibration and validation of GPS-data processing pipelines across diverse urban and behavioral scenarios.

Abstract

Human mobility datasets have seen increasing adoption in the past decade, enabling diverse applications that leverage the high precision of measured trajectories relative to other human mobility datasets. However, there are concerns about whether the high sparsity in some commercial datasets can introduce errors due to lack of robustness in processing algorithms, which could compromise the validity of downstream results. The scarcity of "ground-truth" data makes it particularly challenging to evaluate and calibrate these algorithms. To overcome these limitations and allow for an intermediate form of validation of common processing algorithms, we propose a synthetic trajectory simulator and sandbox environment meant to replicate the features of commercial datasets that could cause errors in such algorithms, and which can be used to compare algorithm outputs with "ground-truth" synthetic trajectories and mobility diaries. Our code is open-source and is publicly available alongside tutorial notebooks and sample datasets generated with it.

Garden city: A synthetic dataset and sandbox environment for analysis of pre-processing algorithms for GPS human mobility data

TL;DR

Garden City provides a synthetic, ground-truth-friendly sandbox to evaluate GPS pre-processing algorithms under realistic sparsity and noise. It combines a generated city, an EPR-based mobility diary with circadian constraints, a micro-macro trajectory generator, and a self-exciting sampling process to produce sparse, clustered pings with measurement noise. The paper demonstrates robustness testing for stop-detection and noise sensitivity, illustrating how burstiness and context affect algorithm outputs. By offering open-source code, tutorials, and data, it enables rigorous calibration and validation of GPS-data processing pipelines across diverse urban and behavioral scenarios.

Abstract

Human mobility datasets have seen increasing adoption in the past decade, enabling diverse applications that leverage the high precision of measured trajectories relative to other human mobility datasets. However, there are concerns about whether the high sparsity in some commercial datasets can introduce errors due to lack of robustness in processing algorithms, which could compromise the validity of downstream results. The scarcity of "ground-truth" data makes it particularly challenging to evaluate and calibrate these algorithms. To overcome these limitations and allow for an intermediate form of validation of common processing algorithms, we propose a synthetic trajectory simulator and sandbox environment meant to replicate the features of commercial datasets that could cause errors in such algorithms, and which can be used to compare algorithm outputs with "ground-truth" synthetic trajectories and mobility diaries. Our code is open-source and is publicly available alongside tutorial notebooks and sample datasets generated with it.

Paper Structure

This paper contains 17 sections, 9 equations, 12 figures.

Figures (12)

  • Figure 1: Outline of our sparse trajectory generator which proceeds in four stages. a) A city is generated from aligned building blocks, and a population of agents. b) A "mobility diary" can be generated using an exploration and preferential return (EPR) model. c) A complete "ground-truth" trajectory is generated combining a generated diary ("macro" mobility) and a Brownian motion constrained to the current building's boundaries ("micro" mobility). d) Finally, a sparse trajectory can be generated by sampling random times from the complete trajectory according to a self-exciting point process. Gaussian noise is added to the coordinates to simulate measurement error.
  • Figure 2: Example layout of a simulated city with a radial layout inspired by E. Howard's ideal "garden city" concept howard1898garden. The city contains a park (the garden) at the center, with concentric rings of residential, workplace and retail locations. The shaded areas represent the home location and workplace location assigned to a given agent.
  • Figure 3: Example code displaying the attributes of an instance of the class Building in a generated city.
  • Figure 4: Visualization of a generated "ground-truth" trajectory and summary statistics of the corresponding travel diary of a simulated agent. Panel (a) approximates the frequency of locations visited by showing a scatter plot of points in the trajectory every minute for a week-long trajectory. Panel (b) provides a summary of the agent's diary, showing the percentage of time spent at each type of location as well as the average stop duration at each of the building types.
  • Figure 5: Example code that generates an agent's "mobility diary" and "ground-truth" trajectory, from a custom destination diary passed by the user.
  • ...and 7 more figures