CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
Haicheng Liao, Hanlin Kong, Bonan Wang, Chengyue Wang, Wang Ye, Zhengbing He, Chengzhong Xu, Zhenning Li
TL;DR
CoT-Drive tackles robust, real-time motion forecasting for autonomous driving by transferring the reasoning strengths of large language models to lightweight edge models through chain-of-thought prompting and a teacher-student distillation pipeline. It introduces Highway-Text and Urban-Text to train compact LMs to generate semantic scene annotations, and employs a four-module encoder-decoder with multimodal fusion and uncertainty modeling to predict multimodal trajectories. Across five real-world datasets, CoT-Drive outperforms state-of-the-art baselines while maintaining practical edge-device latency, demonstrating the practicality of LLM-inspired scene understanding in AD. The work offers a scalable path toward explainable, generalizable motion forecasting on resource-constrained platforms, combining prompt-engineered linguistic reasoning with efficient edge inference.
Abstract
Accurate motion forecasting is crucial for safe autonomous driving (AD). This study proposes CoT-Drive, a novel approach that enhances motion forecasting by leveraging large language models (LLMs) and a chain-of-thought (CoT) prompting method. We introduce a teacher-student knowledge distillation strategy to effectively transfer LLMs' advanced scene understanding capabilities to lightweight language models (LMs), ensuring that CoT-Drive operates in real-time on edge devices while maintaining comprehensive scene understanding and generalization capabilities. By leveraging CoT prompting techniques for LLMs without additional training, CoT-Drive generates semantic annotations that significantly improve the understanding of complex traffic environments, thereby boosting the accuracy and robustness of predictions. Additionally, we present two new scene description datasets, Highway-Text and Urban-Text, designed for fine-tuning lightweight LMs to generate context-specific semantic annotations. Comprehensive evaluations of five real-world datasets demonstrate that CoT-Drive outperforms existing models, highlighting its effectiveness and efficiency in handling complex traffic scenarios. Overall, this study is the first to consider the practical application of LLMs in this field. It pioneers the training and use of a lightweight LLM surrogate for motion forecasting, setting a new benchmark and showcasing the potential of integrating LLMs into AD systems.
