ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Xueyi Liu; Zuodong Zhong; Yuxin Guo; Yun-Fu Liu; Zhiguo Su; Qichao Zhang; Junli Wang; Yinfeng Gao; Yupeng Zheng; Qiao Lin; Huiyong Chen; Dongbin Zhao

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Xueyi Liu, Zuodong Zhong, Yuxin Guo, Yun-Fu Liu, Zhiguo Su, Qichao Zhang, Junli Wang, Yinfeng Gao, Yupeng Zheng, Qiao Lin, Huiyong Chen, Dongbin Zhao

TL;DR

ReasonPlan tackles robust closed-loop autonomous driving with multimodal large language models by integrating self-supervised Next Scene Prediction (NSP) and supervised Decision Chain-of-Thought (DeCoT). It introduces the Planning-oriented Decision Reasoning (PDR) dataset and a two-stage training strategy to fuse vision-language representations with planning in a closed-loop setting. On Bench2Drive, it achieves strong closed-loop DS improvements and demonstrates notable zero-shot generalization on the DOS benchmark, highlighting improved robustness and interpretability over imitation-learning baselines. The work underscores how explicit reasoning and scene forecasting can bridge high-level cognition with low-level control, advancing cognitive, generalizable autonomous driving systems.

Abstract

Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases. Code and dataset will be found in https://github.com/Liuxueyi/ReasonPlan.

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

TL;DR

Abstract

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)