Table of Contents
Fetching ...

Region-Point Joint Representation for Effective Trajectory Similarity Learning

Hao Long, Silin Zhou, Lisi Chen, Shuo Shang

TL;DR

The paper tackles the limitation of region-only representations in trajectory similarity by introducing RePo, a multimodal framework that jointly learns region-wise (structural via Node2Vec and visual via ResNet) and point-wise (locality, correlation, continuity) representations. These cues are fused through cross-modal attention and a [CLS]-based embedding, trained with a supervised contrastive loss that aligns positives with ground-truth similarity while hard negatives are mined in embedding space, yielding robust, discriminative embeddings. The approach demonstrates strong empirical gains across multiple real-world datasets and similarity metrics, along with comprehensive ablations validating the contribution of each component. Overall, RePo offers scalable, high-precision trajectory retrieval and ranking by effectively combining regional context with fine-grained motion patterns in a unified representation.

Abstract

Recent learning-based methods have reduced the computational complexity of traditional trajectory similarity computation, but state-of-the-art (SOTA) methods still fail to leverage the comprehensive spectrum of trajectory information for similarity modeling. To tackle this problem, we propose \textbf{RePo}, a novel method that jointly encodes \textbf{Re}gion-wise and \textbf{Po}int-wise features to capture both spatial context and fine-grained moving patterns. For region-wise representation, the GPS trajectories are first mapped to grid sequences, and spatial context are captured by structural features and semantic context enriched by visual features. For point-wise representation, three lightweight expert networks extract local, correlation, and continuous movement patterns from dense GPS sequences. Then, a router network adaptively fuses the learned point-wise features, which are subsequently combined with region-wise features using cross-attention to produce the final trajectory embedding. To train RePo, we adopt a contrastive loss with hard negative samples to provide similarity ranking supervision. Experiment results show that RePo achieves an average accuracy improvement of 22.2\% over SOTA baselines across all evaluation metrics.

Region-Point Joint Representation for Effective Trajectory Similarity Learning

TL;DR

The paper tackles the limitation of region-only representations in trajectory similarity by introducing RePo, a multimodal framework that jointly learns region-wise (structural via Node2Vec and visual via ResNet) and point-wise (locality, correlation, continuity) representations. These cues are fused through cross-modal attention and a [CLS]-based embedding, trained with a supervised contrastive loss that aligns positives with ground-truth similarity while hard negatives are mined in embedding space, yielding robust, discriminative embeddings. The approach demonstrates strong empirical gains across multiple real-world datasets and similarity metrics, along with comprehensive ablations validating the contribution of each component. Overall, RePo offers scalable, high-precision trajectory retrieval and ranking by effectively combining regional context with fine-grained motion patterns in a unified representation.

Abstract

Recent learning-based methods have reduced the computational complexity of traditional trajectory similarity computation, but state-of-the-art (SOTA) methods still fail to leverage the comprehensive spectrum of trajectory information for similarity modeling. To tackle this problem, we propose \textbf{RePo}, a novel method that jointly encodes \textbf{Re}gion-wise and \textbf{Po}int-wise features to capture both spatial context and fine-grained moving patterns. For region-wise representation, the GPS trajectories are first mapped to grid sequences, and spatial context are captured by structural features and semantic context enriched by visual features. For point-wise representation, three lightweight expert networks extract local, correlation, and continuous movement patterns from dense GPS sequences. Then, a router network adaptively fuses the learned point-wise features, which are subsequently combined with region-wise features using cross-attention to produce the final trajectory embedding. To train RePo, we adopt a contrastive loss with hard negative samples to provide similarity ranking supervision. Experiment results show that RePo achieves an average accuracy improvement of 22.2\% over SOTA baselines across all evaluation metrics.

Paper Structure

This paper contains 16 sections, 23 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison of trajectory similarity under grid (region) trajectory and GPS (point) trajectory.
  • Figure 2: An overview of our RePo Method
  • Figure 3: Three types of encodings in Point-wise encoder.
  • Figure 4: Trajectory visualization results on the Porto dataset. We compare the top-2 retrievals produced by RePo, NeuTraj, and SIMformer. Significant differences in visualization results are highlighted with red circles.
  • Figure 5: Model training time for one epoch under batch size 128 and model inference time of one trajectory.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1: GPS Trajectory
  • Definition 2: Grid Trajectory