Scalable Trajectory-User Linking with Dual-Stream Representation Networks
Hao Zhang, Wei Chen, Xingyu Zhao, Jianpeng Qi, Guiyuan Jiang, Yanwei Yu
TL;DR
ScaleTUL tackles large-scale trajectory-user linking by introducing a dual-stream trajectory encoder that fuses long-term dependencies via a Structured State Space Model and short-term patterns via Bi-LSTM. It employs a spatio-temporal augmentation strategy to create two views and trains with a supervised contrastive objective to align same-user trajectories while separating different users, followed by a cosine-based linking loss to map trajectories to users. The model uses a two-stage training pipeline, including a projection and alignment phase and a final TUL predictor, achieving robust performance across city-level and nationwide datasets with improved efficiency and scalability. Empirical results demonstrate that ScaleTUL outperforms state-of-the-art baselines on check-in mobility data, confirming its effectiveness for large-scale TUL tasks.
Abstract
Trajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data. In this work, we propose a novel $\underline{Scal}$abl$\underline{e}$ Trajectory-User Linking with dual-stream representation networks for large-scale $\underline{TUL}$ problem, named ScaleTUL. Specifically, ScaleTUL generates two views using temporal and spatial augmentations to exploit supervised contrastive learning framework to effectively capture the irregularities of trajectories. In each view, a dual-stream trajectory encoder, consisting of a long-term encoder and a short-term encoder, is designed to learn unified trajectory representations that fuse different temporal-spatial dependencies. Then, a TUL layer is used to associate the trajectories with the corresponding users in the representation space using a two-stage training model. Experimental results on check-in mobility datasets from three real-world cities and the nationwide U.S. demonstrate the superiority of ScaleTUL over state-of-the-art baselines for large-scale TUL tasks.
