Link Representation Learning for Probabilistic Travel Time Estimation
Chen Xu, Qiang Wang, Lijun Sun
TL;DR
ProbETA tackles travel time estimation by relaxing the independence assumption and modeling the joint distribution of multiple trips as a low-rank, multivariate Gaussian parameterized by learnable link representations learned via empirical Bayes. The approach combines a three-tier hierarchical model for link travel times, a low-rank covariance structure across trips (inter- and intra-trip correlations), and data-augmentation through trip sub-sampling to enable fine-grained gradient updates. It demonstrates state-of-the-art performance on two real GPS datasets, with substantial improvements in MAPE and CRPS over deterministic and probabilistic baselines, and provides interpretable link embeddings that reflect road-network geometry. The framework supports conditional travel-time estimation given nearby completed trips and offers a scalable alternative to high-dimensional joint modeling with tractable inference and training complexity.
Abstract
Travel time estimation is a key task in navigation apps and web mapping services. Existing deterministic and probabilistic methods, based on the assumption of trip independence, predominantly focus on modeling individual trips while overlooking trip correlations. However, real-world conditions frequently introduce strong correlations between trips, influenced by external and internal factors such as weather and the tendencies of drivers. To address this, we propose a deep hierarchical joint probabilistic model ProbETA for travel time estimation, capturing both inter-trip and intra-trip correlations. The joint distribution of travel times across multiple trips is modeled as a low-rank multivariate Gaussian, parameterized by learnable link representations estimated using the empirical Bayes approach. We also introduce a data augmentation method based on trip sub-sampling, allowing for fine-grained gradient backpropagation when learning link representations. During inference, our model estimates the probability distribution of travel time for a queried trip, conditional on spatiotemporally adjacent completed trips. Evaluation on two real-world GPS trajectory datasets demonstrates that ProbETA outperforms state-of-the-art deterministic and probabilistic baselines, with Mean Absolute Percentage Error decreasing by over 12.60%. Moreover, the learned link representations align with the physical network geometry, potentially making them applicable for other tasks.
