Table of Contents
Fetching ...

Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

Qinghong Guo, Yu Wang, Ji Cao, Tongya Zheng, Junshu Dai, Bingde Hu, Shunyu Liu, Canghong Jin

TL;DR

DST presents a dual-branch spatial-temporal self-supervised framework for road network representation learning, addressing spatial heterogeneity and temporal dynamics by combining a spatial branch (mix-hop transition weighting and semantic hypergraph with MI-based contrastive learning) and a temporal branch (causal Transformer with a two-task dynamic loss). The spatial views capture high-order road relationships, while the temporal branch learns 24-hour travel dynamics, and the final representations are fused for downstream tasks. Across three real-city datasets and three tasks—road speed inference, travel time estimation, and trajectory destination prediction—DST achieves state-of-the-art performance and demonstrates strong zero-shot transfer capability. By integrating trajectory-driven dynamics with semantic road relations, DST yields robust, transferable road representations with practical impact for smart city applications.

Abstract

Road network representation learning (RNRL) has attracted increasing attention from both researchers and practitioners as various spatiotemporal tasks are emerging. Recent advanced methods leverage Graph Neural Networks (GNNs) and contrastive learning to characterize the spatial structure of road segments in a self-supervised paradigm. However, spatial heterogeneity and temporal dynamics of road networks raise severe challenges to the neighborhood smoothing mechanism of self-supervised GNNs. To address these issues, we propose a $\textbf{D}$ual-branch $\textbf{S}$patial-$\textbf{T}$emporal self-supervised representation framework for enhanced road representations, termed as DST. On one hand, DST designs a mix-hop transition matrix for graph convolution to incorporate dynamic relations of roads from trajectories. Besides, DST contrasts road representations of the vanilla road network against that of the hypergraph in a spatial self-supervised way. The hypergraph is newly built based on three types of hyperedges to capture long-range relations. On the other hand, DST performs next token prediction as the temporal self-supervised task on the sequences of traffic dynamics based on a causal Transformer, which is further regularized by differentiating traffic modes of weekdays from those of weekends. Extensive experiments against state-of-the-art methods verify the superiority of our proposed framework. Moreover, the comprehensive spatiotemporal modeling facilitates DST to excel in zero-shot learning scenarios.

Dual-branch Spatial-Temporal Self-supervised Representation for Enhanced Road Network Learning

TL;DR

DST presents a dual-branch spatial-temporal self-supervised framework for road network representation learning, addressing spatial heterogeneity and temporal dynamics by combining a spatial branch (mix-hop transition weighting and semantic hypergraph with MI-based contrastive learning) and a temporal branch (causal Transformer with a two-task dynamic loss). The spatial views capture high-order road relationships, while the temporal branch learns 24-hour travel dynamics, and the final representations are fused for downstream tasks. Across three real-city datasets and three tasks—road speed inference, travel time estimation, and trajectory destination prediction—DST achieves state-of-the-art performance and demonstrates strong zero-shot transfer capability. By integrating trajectory-driven dynamics with semantic road relations, DST yields robust, transferable road representations with practical impact for smart city applications.

Abstract

Road network representation learning (RNRL) has attracted increasing attention from both researchers and practitioners as various spatiotemporal tasks are emerging. Recent advanced methods leverage Graph Neural Networks (GNNs) and contrastive learning to characterize the spatial structure of road segments in a self-supervised paradigm. However, spatial heterogeneity and temporal dynamics of road networks raise severe challenges to the neighborhood smoothing mechanism of self-supervised GNNs. To address these issues, we propose a ual-branch patial-emporal self-supervised representation framework for enhanced road representations, termed as DST. On one hand, DST designs a mix-hop transition matrix for graph convolution to incorporate dynamic relations of roads from trajectories. Besides, DST contrasts road representations of the vanilla road network against that of the hypergraph in a spatial self-supervised way. The hypergraph is newly built based on three types of hyperedges to capture long-range relations. On the other hand, DST performs next token prediction as the temporal self-supervised task on the sequences of traffic dynamics based on a causal Transformer, which is further regularized by differentiating traffic modes of weekdays from those of weekends. Extensive experiments against state-of-the-art methods verify the superiority of our proposed framework. Moreover, the comprehensive spatiotemporal modeling facilitates DST to excel in zero-shot learning scenarios.

Paper Structure

This paper contains 46 sections, 16 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: An illustration example of spatial heterogeneity and temporal dynamics. (a) Distant roads with similar configurations can be connected by a travel trajectory, whereas nearby roads may not necessarily share similar characteristics. (b) The traffic patterns of roads are characterized not only by road types but also by temporal dynamics.
  • Figure 2: The overview of the proposed DST framework. The high-order relationships are modeled via mix-hop transition matrix weighting and multi-view graph contrastive learning. The temporal travel traffic dynamics are integrated by the Transformer with two specific task-driven updates. Both block co-enhanced representations power downstream tasks jointly.
  • Figure 3: Ablation study on Beijing dataset.
  • Figure 4: Parameter sensitivity on destination prediction.
  • Figure 5: Case study of the highest traffic road in Beijing.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1: Road Network
  • Definition 2: Road Hypergraph
  • Definition 3: Trajectory
  • Definition 4: Traffic Dynamics