Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

Qinghong Guo; Yu Wang; Ji Cao; Tongya Zheng; Junshu Dai; Bingde Hu; Shunyu Liu; Canghong Jin

Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

Qinghong Guo, Yu Wang, Ji Cao, Tongya Zheng, Junshu Dai, Bingde Hu, Shunyu Liu, Canghong Jin

TL;DR

DST presents a dual-branch spatial-temporal self-supervised framework for road network representation learning, addressing spatial heterogeneity and temporal dynamics by combining a spatial branch (mix-hop transition weighting and semantic hypergraph with MI-based contrastive learning) and a temporal branch (causal Transformer with a two-task dynamic loss). The spatial views capture high-order road relationships, while the temporal branch learns 24-hour travel dynamics, and the final representations are fused for downstream tasks. Across three real-city datasets and three tasks—road speed inference, travel time estimation, and trajectory destination prediction—DST achieves state-of-the-art performance and demonstrates strong zero-shot transfer capability. By integrating trajectory-driven dynamics with semantic road relations, DST yields robust, transferable road representations with practical impact for smart city applications.

Abstract

Road network representation learning (RNRL) has attracted increasing attention from both researchers and practitioners as various spatiotemporal tasks are emerging. Recent advanced methods leverage Graph Neural Networks (GNNs) and contrastive learning to characterize the spatial structure of road segments in a self-supervised paradigm. However, spatial heterogeneity and temporal dynamics of road networks raise severe challenges to the neighborhood smoothing mechanism of self-supervised GNNs. To address these issues, we propose a $\textbf{D}$ual-branch $\textbf{S}$patial-$\textbf{T}$emporal self-supervised representation framework for enhanced road representations, termed as DST. On one hand, DST designs a mix-hop transition matrix for graph convolution to incorporate dynamic relations of roads from trajectories. Besides, DST contrasts road representations of the vanilla road network against that of the hypergraph in a spatial self-supervised way. The hypergraph is newly built based on three types of hyperedges to capture long-range relations. On the other hand, DST performs next token prediction as the temporal self-supervised task on the sequences of traffic dynamics based on a causal Transformer, which is further regularized by differentiating traffic modes of weekdays from those of weekends. Extensive experiments against state-of-the-art methods verify the superiority of our proposed framework. Moreover, the comprehensive spatiotemporal modeling facilitates DST to excel in zero-shot learning scenarios.

Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

TL;DR

Abstract

Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)

Theorems & Definitions (4)