Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Zhengxing Lan; Hongbo Li; Lingshan Liu; Bo Fan; Yisheng Lv; Yilong Ren; Zhiyong Cui

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Zhengxing Lan, Hongbo Li, Lingshan Liu, Bo Fan, Yisheng Lv, Yilong Ren, Zhiyong Cui

TL;DR

This work investigates leveraging pre-trained Large Language Models (LLMs) for autonomous-vehicle trajectory prediction without explicit prompt engineering. By introducing sparse context joint encoding, a lane-aware Mamba module, and a multi-modal Laplace decoder, Traj-LLM enables LLMs to capture high-level scene knowledge and interactions for multi-trajectory forecasting. The approach achieves state-of-the-art results on nuScenes, with strong few-shot performance and efficient inference, while ablations confirm the importance of both the LLM-based high-level modeling and lane-focused guidance. Overall, Traj-LLM presents a universal, adaptable framework that expands the role of LLMs in motion forecasting beyond prompting, enabling robust, multi-modal predictions in complex driving scenes.

Abstract

Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explicit prompt engineering to generate future motion from agents' past/observed trajectories and scene semantics. Traj-LLM starts with sparse context joint coding to dissect the agent and scene features into a form that LLMs understand. On this basis, we innovatively explore LLMs' powerful comprehension abilities to capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function and enhancing Traj-LLM's scene comprehension, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module. Finally, a multi-modal Laplace decoder is designed to achieve scene-compliant multi-modal predictions. Extensive experiments manifest that Traj-LLM, fortified by LLMs' strong prior knowledge and understanding prowess, together with lane-aware probability learning, outstrips state-of-the-art methods across evaluation metrics. Moreover, the few-shot analysis further substantiates Traj-LLM's performance, wherein with just 50% of the dataset, it outperforms the majority of benchmarks relying on complete data utilization. This study explores equipping the trajectory prediction task with advanced capabilities inherent in LLMs, furnishing a more universal and adaptable solution for forecasting agent motion in a new way.

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

TL;DR

Abstract

Paper Structure (19 sections, 16 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 19 sections, 16 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Trajectory Prediction
Large Language Models
Problem Formulation
Proposed Model
Sparse Context Joint Encoding
High-level Interaction Modeling
Lane-aware Probability Learning
Multi-modal Laplace Decoder
Experiment
Experimental Setup
Results and Computational Performance
Ablation Studies
Few-shot Study
...and 4 more sections

Figures (9)

Figure 1: Framework of Traj-LLM.
Figure 2: The overview of Pre-trained LLMs.
Figure 3: The proposed Mamba layer for lane-aware probability learning.
Figure 4: Comparison of Traj-LLM with baseline models across three key metrics: trainable parameters, inference speed, and $\text{MR}_5$. The size of the circles in the figure corresponds to the number of trainable parameters in each model.
Figure 5: Comparison of Traj-LLM with baseline models across three key metrics: trainable parameters, inference speed, and $\text{MR}_{10}$. The size of the circles in the figure corresponds to the number of trainable parameters in each model.
...and 4 more figures

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

TL;DR

Abstract

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)