Region-Point Joint Representation for Effective Trajectory Similarity Learning
Hao Long, Silin Zhou, Lisi Chen, Shuo Shang
TL;DR
The paper tackles the limitation of region-only representations in trajectory similarity by introducing RePo, a multimodal framework that jointly learns region-wise (structural via Node2Vec and visual via ResNet) and point-wise (locality, correlation, continuity) representations. These cues are fused through cross-modal attention and a [CLS]-based embedding, trained with a supervised contrastive loss that aligns positives with ground-truth similarity while hard negatives are mined in embedding space, yielding robust, discriminative embeddings. The approach demonstrates strong empirical gains across multiple real-world datasets and similarity metrics, along with comprehensive ablations validating the contribution of each component. Overall, RePo offers scalable, high-precision trajectory retrieval and ranking by effectively combining regional context with fine-grained motion patterns in a unified representation.
Abstract
Recent learning-based methods have reduced the computational complexity of traditional trajectory similarity computation, but state-of-the-art (SOTA) methods still fail to leverage the comprehensive spectrum of trajectory information for similarity modeling. To tackle this problem, we propose \textbf{RePo}, a novel method that jointly encodes \textbf{Re}gion-wise and \textbf{Po}int-wise features to capture both spatial context and fine-grained moving patterns. For region-wise representation, the GPS trajectories are first mapped to grid sequences, and spatial context are captured by structural features and semantic context enriched by visual features. For point-wise representation, three lightweight expert networks extract local, correlation, and continuous movement patterns from dense GPS sequences. Then, a router network adaptively fuses the learned point-wise features, which are subsequently combined with region-wise features using cross-attention to produce the final trajectory embedding. To train RePo, we adopt a contrastive loss with hard negative samples to provide similarity ranking supervision. Experiment results show that RePo achieves an average accuracy improvement of 22.2\% over SOTA baselines across all evaluation metrics.
