Blurred Encoding for Trajectory Representation Learning
Silin Zhou, Yao Chen, Shuo Shang, Lisi Chen, Bingsheng He, Ryosuke Shibasaki
TL;DR
The paper tackles the loss of fine-grained spatial-temporal detail in trajectory representation learning (TRL) by moving away from grid/road abstractions to a hierarchical patch-based trajectory expressed via blurred encoding. BLUE builds a pyramid encoder–decoder with Transformers that operate on multi-level patch trajectories, using attention-based pooling and cross-attention-based up-resolution to fuse information across levels. Spatial-temporal embeddings enrich GPS points before patching, and a reconstruction objective with mean squared error (MSE) drives learning, enabling robust, generalizable representations without reliance on city-specific grids or road maps. Empirical results on Porto and Chengdu across travel time estimation, most similar trajectory search, and trajectory classification show BLUE consistently outperforms eight SOTA baselines, with notable improvements in accuracy and transferability, while maintaining efficiency.
Abstract
Trajectory representation learning (TRL) maps trajectories to vector embeddings and facilitates tasks such as trajectory classification and similarity search. State-of-the-art (SOTA) TRL methods transform raw GPS trajectories to grid or road trajectories to capture high-level travel semantics, i.e., regions and roads. However, they lose fine-grained spatial-temporal details as multiple GPS points are grouped into a single grid cell or road segment. To tackle this problem, we propose the BLUrred Encoding method, dubbed BLUE, which gradually reduces the precision of GPS coordinates to create hierarchical patches with multiple levels. The low-level patches are small and preserve fine-grained spatial-temporal details, while the high-level patches are large and capture overall travel patterns. To complement different patch levels with each other, our BLUE is an encoder-decoder model with a pyramid structure. At each patch level, a Transformer is used to learn the trajectory embedding at the current level, while pooling prepares inputs for the higher level in the encoder, and up-resolution provides guidance for the lower level in the decoder. BLUE is trained using the trajectory reconstruction task with the MSE loss. We compare BLUE with 8 SOTA TRL methods for 3 downstream tasks, the results show that BLUE consistently achieves higher accuracy than all baselines, outperforming the best-performing baselines by an average of 30.90%. Our code is available at https://github.com/slzhou-xy/BLUE.
