Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics
Stefan Schestakov, Simon Gottschalk
TL;DR
TIGR addresses the challenge of learning robust trajectory representations by integrating grid-based and road-based modalities while modeling dynamic spatio-temporal traffic patterns. The approach employs a three-branch architecture (grid, road, spatio-temporal) with intra- and inter-modal contrastive losses, a novel spatio-temporal extraction pipeline (dynamic traffic embedding, temporal embedding, and local multi-head attention), and specialized masking strategies to learn rich representations. Extensive experiments on Porto and San Francisco datasets show TIGR outperforms state-of-the-art baselines across trajectory similarity, travel time estimation, and destination prediction, with substantial gains and robust behavior to increasing negative samples. The work also provides a first systematic comparison of grid- and road-based TRL modalities, highlighting their complementary strengths and the value of jointly modeling both alongside traffic dynamics for practical urban analytics.
Abstract
Trajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning, as it facilitates the utilization of trajectory data (e.g., vehicle movements) for various downstream applications, such as trajectory similarity computation or travel time estimation. This is achieved by learning low-dimensional representations from high-dimensional and raw trajectory data. However, existing methods for trajectory representation learning either rely on grid-based or road-based representations, which are inherently different and thus, could lose information contained in the other modality. Moreover, these methods overlook the dynamic nature of urban traffic, relying on static road network features rather than time varying traffic patterns. In this paper, we propose TIGR, a novel model designed to integrate grid and road network modalities while incorporating spatio-temporal dynamics to learn rich, general-purpose representations of trajectories. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities by substantially outperforming state-of-the-art methods, i.e., up to 43.22% for trajectory similarity, up to 16.65% for travel time estimation, and up to 10.16% for destination prediction.
