Link Representation Learning for Probabilistic Travel Time Estimation

Chen Xu; Qiang Wang; Lijun Sun

Link Representation Learning for Probabilistic Travel Time Estimation

Chen Xu, Qiang Wang, Lijun Sun

TL;DR

ProbETA tackles travel time estimation by relaxing the independence assumption and modeling the joint distribution of multiple trips as a low-rank, multivariate Gaussian parameterized by learnable link representations learned via empirical Bayes. The approach combines a three-tier hierarchical model for link travel times, a low-rank covariance structure across trips (inter- and intra-trip correlations), and data-augmentation through trip sub-sampling to enable fine-grained gradient updates. It demonstrates state-of-the-art performance on two real GPS datasets, with substantial improvements in MAPE and CRPS over deterministic and probabilistic baselines, and provides interpretable link embeddings that reflect road-network geometry. The framework supports conditional travel-time estimation given nearby completed trips and offers a scalable alternative to high-dimensional joint modeling with tractable inference and training complexity.

Abstract

Travel time estimation is a key task in navigation apps and web mapping services. Existing deterministic and probabilistic methods, based on the assumption of trip independence, predominantly focus on modeling individual trips while overlooking trip correlations. However, real-world conditions frequently introduce strong correlations between trips, influenced by external and internal factors such as weather and the tendencies of drivers. To address this, we propose a deep hierarchical joint probabilistic model ProbETA for travel time estimation, capturing both inter-trip and intra-trip correlations. The joint distribution of travel times across multiple trips is modeled as a low-rank multivariate Gaussian, parameterized by learnable link representations estimated using the empirical Bayes approach. We also introduce a data augmentation method based on trip sub-sampling, allowing for fine-grained gradient backpropagation when learning link representations. During inference, our model estimates the probability distribution of travel time for a queried trip, conditional on spatiotemporally adjacent completed trips. Evaluation on two real-world GPS trajectory datasets demonstrates that ProbETA outperforms state-of-the-art deterministic and probabilistic baselines, with Mean Absolute Percentage Error decreasing by over 12.60%. Moreover, the learned link representations align with the physical network geometry, potentially making them applicable for other tasks.

Link Representation Learning for Probabilistic Travel Time Estimation

TL;DR

Abstract

Paper Structure (29 sections, 19 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 19 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Deterministic Travel Time Estimation
Probabilistic Regression
Definitions and Problem Formulation
Definition: Road Network
Definition: Links and Trips
Problem Formulation
Methodology
Overview of ProbETA
Parameterizing Link Travel Time Distribution
Joint Distribution for Multiple Trips
Data Augmentation for Link Representation Learning
Conditional Travel Time Estimation based on Joint Probability Distribution
Experiment
...and 14 more sections

Figures (7)

Figure 1: Overall architecture of ProbETA. The black flow represents the shared process between training and inference, the green flow represents the training process, and the blue flow represents the inference process.
Figure 2: Graph node contraction. Aggregating multiple nodes into new nodes, this contraction property can be used to simplify the graph structure or construct trip representations.
Figure 3: Illustration for sub-sampling data augmentation.
Figure 4: Link correlation visualization. (a). Learned inter-trip link correlation during 9:00-10:00 AM. (b). Learned intra-trip link correlation during 9:00-10:00 AM. (c). Learned inter-trip link correlation during 9:00-10:00 PM. (d). Learned intra-trip link correlation during 9:00-10:00 PM. (e). Real adjacency matrix. (f). Link correlation heatmap from inter-trip. (g). Link correlation heatmap from intra-trip. (h). Visualization of embedding vectors by Principal Component Analysis (PCA). (i). Visualization of real link locations
Figure 5: Cumulative travel time (min) estimation: shaded area shows $\mu\pm \sigma$. (a). Trip sample from Chengdu. (b). Trip sample from Harbin.
...and 2 more figures

Link Representation Learning for Probabilistic Travel Time Estimation

TL;DR

Abstract

Link Representation Learning for Probabilistic Travel Time Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)