Garden city: A synthetic dataset and sandbox environment for analysis of pre-processing algorithms for GPS human mobility data
Thomas H. Li, Francisco Barreras
TL;DR
Garden City provides a synthetic, ground-truth-friendly sandbox to evaluate GPS pre-processing algorithms under realistic sparsity and noise. It combines a generated city, an EPR-based mobility diary with circadian constraints, a micro-macro trajectory generator, and a self-exciting sampling process to produce sparse, clustered pings with measurement noise. The paper demonstrates robustness testing for stop-detection and noise sensitivity, illustrating how burstiness and context affect algorithm outputs. By offering open-source code, tutorials, and data, it enables rigorous calibration and validation of GPS-data processing pipelines across diverse urban and behavioral scenarios.
Abstract
Human mobility datasets have seen increasing adoption in the past decade, enabling diverse applications that leverage the high precision of measured trajectories relative to other human mobility datasets. However, there are concerns about whether the high sparsity in some commercial datasets can introduce errors due to lack of robustness in processing algorithms, which could compromise the validity of downstream results. The scarcity of "ground-truth" data makes it particularly challenging to evaluate and calibrate these algorithms. To overcome these limitations and allow for an intermediate form of validation of common processing algorithms, we propose a synthetic trajectory simulator and sandbox environment meant to replicate the features of commercial datasets that could cause errors in such algorithms, and which can be used to compare algorithm outputs with "ground-truth" synthetic trajectories and mobility diaries. Our code is open-source and is publicly available alongside tutorial notebooks and sample datasets generated with it.
