Table of Contents
Fetching ...

On Predicting Sociodemographics from Mobility Signals

Ekin Uğurel, Cynthia Chen, Brian H. Y. Lee, Filipe Rodrigues

TL;DR

The paper addresses the challenge of predicting sociodemographic attributes from mobility traces by introducing behaviorally grounded, higher-order mobility descriptors derived from directed mobility graphs, paired with uncertainty-calibrated evaluation and a multitask learning framework. It shows that combining activity- and edge-level mobility features improves out-of-sample predictive performance and, in many settings, yields better calibration than baselines. Multitask learning provides data-efficient, robust estimates and enhances transfer across time periods, though gains are task-dependent and may incur occasional negative transfer. Overall, the work advances practical sociodemographic inference for transportation planning under distribution shifts, while emphasizing calibrated uncertainty and cross-task regularization as key design principles.

Abstract

Inferring sociodemographic attributes from mobility data could help transportation planners better leverage passively collected datasets, but this task remains difficult due to weak and inconsistent relationships between mobility patterns and sociodemographic traits, as well as limited generalization across contexts. We address these challenges from three angles. First, to improve predictive accuracy while retaining interpretability, we introduce a behaviorally grounded set of higher-order mobility descriptors based on directed mobility graphs. These features capture structured patterns in trip sequences, travel modes, and social co-travel, and significantly improve prediction of age, gender, income, and household structure over baselines features. Second, we introduce metrics and visual diagnostic tools that encourage evenness between model confidence and accuracy, enabling planners to quantify uncertainty. Third, to improve generalization and sample efficiency, we develop a multitask learning framework that jointly predicts multiple sociodemographic attributes from a shared representation. This approach outperforms single-task models, particularly when training data are limited or when applying models across different time periods (i.e., when the test set distribution differs from the training set).

On Predicting Sociodemographics from Mobility Signals

TL;DR

The paper addresses the challenge of predicting sociodemographic attributes from mobility traces by introducing behaviorally grounded, higher-order mobility descriptors derived from directed mobility graphs, paired with uncertainty-calibrated evaluation and a multitask learning framework. It shows that combining activity- and edge-level mobility features improves out-of-sample predictive performance and, in many settings, yields better calibration than baselines. Multitask learning provides data-efficient, robust estimates and enhances transfer across time periods, though gains are task-dependent and may incur occasional negative transfer. Overall, the work advances practical sociodemographic inference for transportation planning under distribution shifts, while emphasizing calibrated uncertainty and cross-task regularization as key design principles.

Abstract

Inferring sociodemographic attributes from mobility data could help transportation planners better leverage passively collected datasets, but this task remains difficult due to weak and inconsistent relationships between mobility patterns and sociodemographic traits, as well as limited generalization across contexts. We address these challenges from three angles. First, to improve predictive accuracy while retaining interpretability, we introduce a behaviorally grounded set of higher-order mobility descriptors based on directed mobility graphs. These features capture structured patterns in trip sequences, travel modes, and social co-travel, and significantly improve prediction of age, gender, income, and household structure over baselines features. Second, we introduce metrics and visual diagnostic tools that encourage evenness between model confidence and accuracy, enabling planners to quantify uncertainty. Third, to improve generalization and sample efficiency, we develop a multitask learning framework that jointly predicts multiple sociodemographic attributes from a shared representation. This approach outperforms single-task models, particularly when training data are limited or when applying models across different time periods (i.e., when the test set distribution differs from the training set).

Paper Structure

This paper contains 19 sections, 12 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Example daily mobility graph where the edges are chronologically numbered. Width of trip arrows corresponds to frequency of visits over observation period. Dashed arrows denote known trips that are not observed on this day.
  • Figure 2: Illustration of daily travel motifs Schneider2013UnravellingWu2019Inferring; (a) Out-and-back; (b) Chain; (c) Cycle-chain; (d, h) Double-cycle; (e) Single-no-return; (f, g) Single-Cycle
  • Figure 3: Illustration of selected metrics; (left) example travel day. In this case, $n_{\text{trips}}=7$ and $f_{\text{comp}}=3/7$; (right) each color denotes a tour to/from the anchors. In this case, $n_{\text{tour}}=4$ and $f_{\text{mm}}=2/4$.
  • Figure 4: Shared-trunk multitask architecture. Mobility descriptors are mapped to a shared representation $h = g(X; \theta)$ by a three-layer feed-forward network with ReLU activations and dropout. Four task-specific heads $f_t$ produce class probabilities via softmax. Training minimizes the cross-entropy loss $\sum_t w_t \cdot \ell_t$ where we set the weights to be equal.
  • Figure 5: Largest magnitude Spearman rank correlations (all $p < 0.001$) between mobility descriptors and demographics. (top left): age; (top right): household income; (bottom left): gender; (bottom right): number of children. Bars to the left indicate negative associations; to the right, positive.
  • ...and 4 more figures