PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models
Tonglong Wei, Yan Lin, Youfang Lin, Shengnan Guo, Jilin Hu, Haitao Yuan, Gao Cong, Huaiyu Wan
TL;DR
The paper tackles recovering dense, map-matched trajectories from sparse observations caused by device or network failures. It introduces PLMTrajRec, a pre-trained language model–based framework enhanced with dual trajectory prompts (IF-guided explicit and AF-guided implicit), an interval-aware trajectory embedder, and a LoRA-fine-tuned PLM encoder to predict road segments and moving ratios. Key contributions include interval unification, road-condition modeling via area flow and passing mechanisms, a multi-task loss with joint training across sampling intervals, and extensive evaluations showing strong scalability and generalization on Chengdu and Porto datasets, including zero-shot interval scenarios. The approach demonstrates practical impact by enabling accurate trajectory recovery with limited dense data, benefiting urban planning, traffic management, and location-based services in real-world sparse-data settings.
Abstract
Spatiotemporal trajectory data is crucial for various applications. However, issues such as device malfunctions and network instability often cause sparse trajectories, leading to lost detailed movement information. Recovering the missing points in sparse trajectories to restore the detailed information is thus essential. Despite recent progress, several challenges remain. First, the lack of large-scale dense trajectory data makes it difficult to train a trajectory recovery model from scratch. Second, the varying spatiotemporal correlations in sparse trajectories make it hard to generalize recovery across different sampling intervals. Third, the lack of location information complicates the extraction of road conditions for missing points. To address these challenges, we propose a novel trajectory recovery model called PLMTrajRec. It leverages the scalability of a pre-trained language model (PLM) and can be fine-tuned with only a limited set of dense trajectories. To handle different sampling intervals in sparse trajectories, we first convert each trajectory's sampling interval and movement features into natural language representations, allowing the PLM to recognize its interval. We then introduce a trajectory encoder to unify trajectories of varying intervals into a single interval and capture their spatiotemporal relationships. To obtain road conditions for missing points, we propose an area flow-guided implicit trajectory prompt, which models road conditions by collecting traffic flows in each region. We also introduce a road condition passing mechanism that uses observed points' road conditions to infer those of the missing points. Experiments on two public trajectory datasets with three sampling intervals each demonstrate the effectiveness, scalability, and generalization ability of PLMTrajRec.
