NLP-enabled Trajectory Map-matching in Urban Road Networks using a Transformer-based Encoder-decoder
Sevin Mohammadi, Andrew W. Smyth
TL;DR
This work reframes trajectory map-matching as a sequence-to-sequence translation task by treating noisy, sparse GPS points as a source sequence and the corresponding road-segment path as a target sequence. It introduces a transformer-based encoder–decoder surrogate that learns context-rich representations of driver behavior, road-network structure, and spatial GPS noise patterns from large-scale trajectories, with grid-based discretization to connect GPS points to candidate road segments. Data augmentation enables evaluation in the absence of plentiful ground-truth in New York City, and the transformer model achieves a leading accuracy of 0.752, BLEU 0.756, and Jaccard 0.781 on synthetic tests, outperforming ST-matching and Leuven-HMM across noise levels. The results demonstrate the feasibility of context-aware, data-driven trajectory map-matching and point toward region- and city-wide trajectory foundation models for scalable urban mobility analytics, with future work on transfer learning and region-specific encoding schemes.
Abstract
Vehicular trajectory data from geolocation telematics is vital for analyzing urban mobility patterns. Map-matching aligns noisy, sparsely sampled GPS trajectories with digital road maps to reconstruct accurate vehicle paths. Traditional methods rely on geometric proximity, topology, and shortest-path heuristics, but they overlook two key factors: (1) drivers may prefer routes based on local road characteristics rather than shortest paths, revealing learnable shared preferences, and (2) GPS noise varies spatially due to multipath effects. These factors can reduce the effectiveness of conventional methods in complex scenarios and increase the effort required for heuristic-based implementations. This study introduces a data-driven, deep learning-based map-matching framework, formulating the task as machine translation, inspired by NLP. Specifically, a transformer-based encoder-decoder model learns contextual representations of noisy GPS points to infer trajectory behavior and road structures in an end-to-end manner. Trained on large-scale trajectory data, the method improves path estimation accuracy. Experiments on synthetic trajectories show that this approach outperforms conventional methods by integrating contextual awareness. Evaluation on real-world GPS traces from Manhattan, New York, achieves 75% accuracy in reconstructing navigated routes. These results highlight the effectiveness of transformers in capturing drivers' trajectory behaviors, spatial dependencies, and noise patterns, offering a scalable, robust solution for map-matching. This work contributes to advancing trajectory-driven foundation models for geospatial modeling and urban mobility applications.
