On Predicting Sociodemographics from Mobility Signals
Ekin Uğurel, Cynthia Chen, Brian H. Y. Lee, Filipe Rodrigues
TL;DR
The paper addresses the challenge of predicting sociodemographic attributes from mobility traces by introducing behaviorally grounded, higher-order mobility descriptors derived from directed mobility graphs, paired with uncertainty-calibrated evaluation and a multitask learning framework. It shows that combining activity- and edge-level mobility features improves out-of-sample predictive performance and, in many settings, yields better calibration than baselines. Multitask learning provides data-efficient, robust estimates and enhances transfer across time periods, though gains are task-dependent and may incur occasional negative transfer. Overall, the work advances practical sociodemographic inference for transportation planning under distribution shifts, while emphasizing calibrated uncertainty and cross-task regularization as key design principles.
Abstract
Inferring sociodemographic attributes from mobility data could help transportation planners better leverage passively collected datasets, but this task remains difficult due to weak and inconsistent relationships between mobility patterns and sociodemographic traits, as well as limited generalization across contexts. We address these challenges from three angles. First, to improve predictive accuracy while retaining interpretability, we introduce a behaviorally grounded set of higher-order mobility descriptors based on directed mobility graphs. These features capture structured patterns in trip sequences, travel modes, and social co-travel, and significantly improve prediction of age, gender, income, and household structure over baselines features. Second, we introduce metrics and visual diagnostic tools that encourage evenness between model confidence and accuracy, enabling planners to quantify uncertainty. Third, to improve generalization and sample efficiency, we develop a multitask learning framework that jointly predicts multiple sociodemographic attributes from a shared representation. This approach outperforms single-task models, particularly when training data are limited or when applying models across different time periods (i.e., when the test set distribution differs from the training set).
