Table of Contents
Fetching ...

Enhancing stop location detection for incomplete urban mobility datasets

Margherita Bertè, Rashid Ibrahimli, Lars Koopmans, Pablo Valgañón, Nicola Zomer, Davide Colombi

TL;DR

This work tackles stop-location detection in urban mobility when GPS data are noisy or incomplete. It combines density-based stop detection with a supervised classifier that leverages features capturing individual routines, local geohash context, and spatio-temporal patterns, including an entropy-based locality measure $S_j = -\sum_{i=1}^n p_{ij} \log(p_{ij})$. Using privacy-preserving Cuebiq data from the NY-NJ-PA area, the authors evaluate three models (LightGBM, Random Forest, and a 3-layer FFNN) under simulated data gaps and class imbalance, reporting high AUCs (~0.97–0.98) and recall (~0.76–0.81) but low precision (~0.023–0.027). The analysis shows false positives often occur at recurring locations near true stops, highlighting the utility of the features while underscoring limitations in ground-truth labeling and dataset size. Overall, the study demonstrates a viable, scalable path to robust stop detection in incomplete mobility datasets, with clear directions for validation on larger, more diverse data and for incorporating collective behavior and external factors.

Abstract

Stop location detection, within human mobility studies, has an impacts in multiple fields including urban planning, transport network design, epidemiological modeling, and socio-economic segregation analysis. However, it remains a challenging task because classical density clustering algorithms often struggle with noisy or incomplete GPS datasets. This study investigates the application of classification algorithms to enhance density-based methods for stop identification. Our approach incorporates multiple features, including individual routine behavior across various time scales and local characteristics of individual GPS points. The dataset comprises privacy-preserving and anonymized GPS points previously labeled as stops by a sequence-oriented, density-dependent algorithm. We simulated data gaps by removing point density from select stops to assess performance under sparse data conditions. The model classifies individual GPS points within trajectories as potential stops or non-stops. Given the highly imbalanced nature of the dataset, we prioritized recall over precision in performance evaluation. Results indicate that this method detects most stops, even in the presence of spatio-temporal gaps and that points classified as false positives often correspond to recurring locations for devices, typically near previous stops. While this research contributes to mobility analysis techniques, significant challenges persist. The lack of ground truth data limits definitive conclusions about the algorithm's accuracy. Further research is needed to validate the method across diverse datasets and to incorporate collective behavior inputs.

Enhancing stop location detection for incomplete urban mobility datasets

TL;DR

This work tackles stop-location detection in urban mobility when GPS data are noisy or incomplete. It combines density-based stop detection with a supervised classifier that leverages features capturing individual routines, local geohash context, and spatio-temporal patterns, including an entropy-based locality measure . Using privacy-preserving Cuebiq data from the NY-NJ-PA area, the authors evaluate three models (LightGBM, Random Forest, and a 3-layer FFNN) under simulated data gaps and class imbalance, reporting high AUCs (~0.97–0.98) and recall (~0.76–0.81) but low precision (~0.023–0.027). The analysis shows false positives often occur at recurring locations near true stops, highlighting the utility of the features while underscoring limitations in ground-truth labeling and dataset size. Overall, the study demonstrates a viable, scalable path to robust stop detection in incomplete mobility datasets, with clear directions for validation on larger, more diverse data and for incorporating collective behavior and external factors.

Abstract

Stop location detection, within human mobility studies, has an impacts in multiple fields including urban planning, transport network design, epidemiological modeling, and socio-economic segregation analysis. However, it remains a challenging task because classical density clustering algorithms often struggle with noisy or incomplete GPS datasets. This study investigates the application of classification algorithms to enhance density-based methods for stop identification. Our approach incorporates multiple features, including individual routine behavior across various time scales and local characteristics of individual GPS points. The dataset comprises privacy-preserving and anonymized GPS points previously labeled as stops by a sequence-oriented, density-dependent algorithm. We simulated data gaps by removing point density from select stops to assess performance under sparse data conditions. The model classifies individual GPS points within trajectories as potential stops or non-stops. Given the highly imbalanced nature of the dataset, we prioritized recall over precision in performance evaluation. Results indicate that this method detects most stops, even in the presence of spatio-temporal gaps and that points classified as false positives often correspond to recurring locations for devices, typically near previous stops. While this research contributes to mobility analysis techniques, significant challenges persist. The lack of ground truth data limits definitive conclusions about the algorithm's accuracy. Further research is needed to validate the method across diverse datasets and to incorporate collective behavior inputs.
Paper Structure (14 sections, 1 equation, 8 figures, 3 tables)

This paper contains 14 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: This sample trajectory, derived from GPS data, illustrates a common challenge: density-based algorithms struggle to accurately detect stop locations when confronted with noisy or incomplete data, particularly when temporal and spatial gaps are present. (Note: the above figure represents a sample trajectory derived from synthetic points in order to preserve privacy)
  • Figure 2: Daily trends showing the number of unique devices (blue line) and the number of stops (red line).
  • Figure 3: Daily stop frequency distribution for a selected individual.
  • Figure 4: Stop density in history of a selected individual (left) and collective (right) history.
  • Figure 5: Data processing pipeline. The different steps of our method are visualized, including the creation of a labeled dataset and the extraction of relevant features.
  • ...and 3 more figures