Table of Contents
Fetching ...

NLP-enabled Trajectory Map-matching in Urban Road Networks using a Transformer-based Encoder-decoder

Sevin Mohammadi, Andrew W. Smyth

TL;DR

This work reframes trajectory map-matching as a sequence-to-sequence translation task by treating noisy, sparse GPS points as a source sequence and the corresponding road-segment path as a target sequence. It introduces a transformer-based encoder–decoder surrogate that learns context-rich representations of driver behavior, road-network structure, and spatial GPS noise patterns from large-scale trajectories, with grid-based discretization to connect GPS points to candidate road segments. Data augmentation enables evaluation in the absence of plentiful ground-truth in New York City, and the transformer model achieves a leading accuracy of 0.752, BLEU 0.756, and Jaccard 0.781 on synthetic tests, outperforming ST-matching and Leuven-HMM across noise levels. The results demonstrate the feasibility of context-aware, data-driven trajectory map-matching and point toward region- and city-wide trajectory foundation models for scalable urban mobility analytics, with future work on transfer learning and region-specific encoding schemes.

Abstract

Vehicular trajectory data from geolocation telematics is vital for analyzing urban mobility patterns. Map-matching aligns noisy, sparsely sampled GPS trajectories with digital road maps to reconstruct accurate vehicle paths. Traditional methods rely on geometric proximity, topology, and shortest-path heuristics, but they overlook two key factors: (1) drivers may prefer routes based on local road characteristics rather than shortest paths, revealing learnable shared preferences, and (2) GPS noise varies spatially due to multipath effects. These factors can reduce the effectiveness of conventional methods in complex scenarios and increase the effort required for heuristic-based implementations. This study introduces a data-driven, deep learning-based map-matching framework, formulating the task as machine translation, inspired by NLP. Specifically, a transformer-based encoder-decoder model learns contextual representations of noisy GPS points to infer trajectory behavior and road structures in an end-to-end manner. Trained on large-scale trajectory data, the method improves path estimation accuracy. Experiments on synthetic trajectories show that this approach outperforms conventional methods by integrating contextual awareness. Evaluation on real-world GPS traces from Manhattan, New York, achieves 75% accuracy in reconstructing navigated routes. These results highlight the effectiveness of transformers in capturing drivers' trajectory behaviors, spatial dependencies, and noise patterns, offering a scalable, robust solution for map-matching. This work contributes to advancing trajectory-driven foundation models for geospatial modeling and urban mobility applications.

NLP-enabled Trajectory Map-matching in Urban Road Networks using a Transformer-based Encoder-decoder

TL;DR

This work reframes trajectory map-matching as a sequence-to-sequence translation task by treating noisy, sparse GPS points as a source sequence and the corresponding road-segment path as a target sequence. It introduces a transformer-based encoder–decoder surrogate that learns context-rich representations of driver behavior, road-network structure, and spatial GPS noise patterns from large-scale trajectories, with grid-based discretization to connect GPS points to candidate road segments. Data augmentation enables evaluation in the absence of plentiful ground-truth in New York City, and the transformer model achieves a leading accuracy of 0.752, BLEU 0.756, and Jaccard 0.781 on synthetic tests, outperforming ST-matching and Leuven-HMM across noise levels. The results demonstrate the feasibility of context-aware, data-driven trajectory map-matching and point toward region- and city-wide trajectory foundation models for scalable urban mobility analytics, with future work on transfer learning and region-specific encoding schemes.

Abstract

Vehicular trajectory data from geolocation telematics is vital for analyzing urban mobility patterns. Map-matching aligns noisy, sparsely sampled GPS trajectories with digital road maps to reconstruct accurate vehicle paths. Traditional methods rely on geometric proximity, topology, and shortest-path heuristics, but they overlook two key factors: (1) drivers may prefer routes based on local road characteristics rather than shortest paths, revealing learnable shared preferences, and (2) GPS noise varies spatially due to multipath effects. These factors can reduce the effectiveness of conventional methods in complex scenarios and increase the effort required for heuristic-based implementations. This study introduces a data-driven, deep learning-based map-matching framework, formulating the task as machine translation, inspired by NLP. Specifically, a transformer-based encoder-decoder model learns contextual representations of noisy GPS points to infer trajectory behavior and road structures in an end-to-end manner. Trained on large-scale trajectory data, the method improves path estimation accuracy. Experiments on synthetic trajectories show that this approach outperforms conventional methods by integrating contextual awareness. Evaluation on real-world GPS traces from Manhattan, New York, achieves 75% accuracy in reconstructing navigated routes. These results highlight the effectiveness of transformers in capturing drivers' trajectory behaviors, spatial dependencies, and noise patterns, offering a scalable, robust solution for map-matching. This work contributes to advancing trajectory-driven foundation models for geospatial modeling and urban mobility applications.
Paper Structure (29 sections, 12 equations, 13 figures, 2 tables)

This paper contains 29 sections, 12 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Left: Illustration of signal reflections caused by tall buildings in an urban environment, leading to multipath errors. Right: Comparison between the erroneous path in red and the actual path in blue within Manhattan. Map-matching accurately estimates the true path. The illustration provides an example of all candidate segments, denoted as $C_1, C_2, C_3, C_4$, for one of the erroneous raw GPS points collected off the road segment.
  • Figure 2: NLP-enabled trajectory map-matching: recovering a complete, ordered sequence of connected road segments from a noisy, sparse, and discretized sequence of GPS points.
  • Figure 3: A schematic representation of the input and output of the map-matching model.
  • Figure 4: Data structure: The left box shows a snippet of the raw trajectory consisting of $n$ noisy GPS points, denoted as ($p_i|i\in\left\{1,\dots,n\right\}$), sorted by timestamp. The middle box displays the corresponding grid cell IDs for each GPS point, denoted as ($g_i|i\in\left\{1,\dots,n\right\}$). The right box presents the map-matched trajectory, consisting of $m$ road segments, denoted as ($r_j|j\in\left\{1,\dots,m\right\}$).
  • Figure 5: Transformer Model. The encoder on the left and the decoder on the right both contain an embedding layer followed by positional encoding and $N$ transformer blocks. $|G|$ and $|R|$ are the size of unique grids and unique road segments in the region (counterpart of vocabulary size in NLP). $d_{emb}$, $d_{mlp}$, $N$, and $h$ are the model hyperparameters.
  • ...and 8 more figures