Table of Contents
Fetching ...

PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models

Tonglong Wei, Yan Lin, Youfang Lin, Shengnan Guo, Jilin Hu, Haitao Yuan, Gao Cong, Huaiyu Wan

TL;DR

The paper tackles recovering dense, map-matched trajectories from sparse observations caused by device or network failures. It introduces PLMTrajRec, a pre-trained language model–based framework enhanced with dual trajectory prompts (IF-guided explicit and AF-guided implicit), an interval-aware trajectory embedder, and a LoRA-fine-tuned PLM encoder to predict road segments and moving ratios. Key contributions include interval unification, road-condition modeling via area flow and passing mechanisms, a multi-task loss with joint training across sampling intervals, and extensive evaluations showing strong scalability and generalization on Chengdu and Porto datasets, including zero-shot interval scenarios. The approach demonstrates practical impact by enabling accurate trajectory recovery with limited dense data, benefiting urban planning, traffic management, and location-based services in real-world sparse-data settings.

Abstract

Spatiotemporal trajectory data is crucial for various applications. However, issues such as device malfunctions and network instability often cause sparse trajectories, leading to lost detailed movement information. Recovering the missing points in sparse trajectories to restore the detailed information is thus essential. Despite recent progress, several challenges remain. First, the lack of large-scale dense trajectory data makes it difficult to train a trajectory recovery model from scratch. Second, the varying spatiotemporal correlations in sparse trajectories make it hard to generalize recovery across different sampling intervals. Third, the lack of location information complicates the extraction of road conditions for missing points. To address these challenges, we propose a novel trajectory recovery model called PLMTrajRec. It leverages the scalability of a pre-trained language model (PLM) and can be fine-tuned with only a limited set of dense trajectories. To handle different sampling intervals in sparse trajectories, we first convert each trajectory's sampling interval and movement features into natural language representations, allowing the PLM to recognize its interval. We then introduce a trajectory encoder to unify trajectories of varying intervals into a single interval and capture their spatiotemporal relationships. To obtain road conditions for missing points, we propose an area flow-guided implicit trajectory prompt, which models road conditions by collecting traffic flows in each region. We also introduce a road condition passing mechanism that uses observed points' road conditions to infer those of the missing points. Experiments on two public trajectory datasets with three sampling intervals each demonstrate the effectiveness, scalability, and generalization ability of PLMTrajRec.

PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models

TL;DR

The paper tackles recovering dense, map-matched trajectories from sparse observations caused by device or network failures. It introduces PLMTrajRec, a pre-trained language model–based framework enhanced with dual trajectory prompts (IF-guided explicit and AF-guided implicit), an interval-aware trajectory embedder, and a LoRA-fine-tuned PLM encoder to predict road segments and moving ratios. Key contributions include interval unification, road-condition modeling via area flow and passing mechanisms, a multi-task loss with joint training across sampling intervals, and extensive evaluations showing strong scalability and generalization on Chengdu and Porto datasets, including zero-shot interval scenarios. The approach demonstrates practical impact by enabling accurate trajectory recovery with limited dense data, benefiting urban planning, traffic management, and location-based services in real-world sparse-data settings.

Abstract

Spatiotemporal trajectory data is crucial for various applications. However, issues such as device malfunctions and network instability often cause sparse trajectories, leading to lost detailed movement information. Recovering the missing points in sparse trajectories to restore the detailed information is thus essential. Despite recent progress, several challenges remain. First, the lack of large-scale dense trajectory data makes it difficult to train a trajectory recovery model from scratch. Second, the varying spatiotemporal correlations in sparse trajectories make it hard to generalize recovery across different sampling intervals. Third, the lack of location information complicates the extraction of road conditions for missing points. To address these challenges, we propose a novel trajectory recovery model called PLMTrajRec. It leverages the scalability of a pre-trained language model (PLM) and can be fine-tuned with only a limited set of dense trajectories. To handle different sampling intervals in sparse trajectories, we first convert each trajectory's sampling interval and movement features into natural language representations, allowing the PLM to recognize its interval. We then introduce a trajectory encoder to unify trajectories of varying intervals into a single interval and capture their spatiotemporal relationships. To obtain road conditions for missing points, we propose an area flow-guided implicit trajectory prompt, which models road conditions by collecting traffic flows in each region. We also introduce a road condition passing mechanism that uses observed points' road conditions to infer those of the missing points. Experiments on two public trajectory datasets with three sampling intervals each demonstrate the effectiveness, scalability, and generalization ability of PLMTrajRec.

Paper Structure

This paper contains 44 sections, 19 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Impact of road conditions on route selection and movement patterns.
  • Figure 2: An illustration of map-matched trajectory, road segment $e$, and moving ratio $r$.
  • Figure 3: The framework of PLMTrajRec, consists of Dual Trajectory Prompts, Interval-aware Trajectory Embedder, and PLM-based Trajectory Encoder.
  • Figure 4: An illustration of trajectory preprocessing layer.
  • Figure 5: Hyperparameter analysis of the number of reference tokens $K$ of trajectory feature transformation.
  • ...and 3 more figures

Theorems & Definitions (3)

  • definition 1: Trajectory
  • definition 2: Road Network
  • definition 3: Map-matched Trajectory