Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, Qiang Liu
TL;DR
GUMP introduces a scalable, unified generative world model for autonomous driving that learns driving scene dynamics to enable data generation, closed-loop simulation, planning evaluation, and online RL. It couples a simple key-value tokenizer with a Multimodal Causal Transformer and prediction chunking to support long-horizon generation and efficient inference via partial-AR decoding. The model achieves state-of-the-art results on Waymo Sim Agents and nuPlan planning benchmarks while enabling a flexible online training framework, positioning GUMP as a foundation model for motion planning tasks. The work demonstrates significant scalability and broad applicability, with identified future improvements including quantization, vectorized maps, and sensor integration.
Abstract
As autonomous driving systems being deployed to millions of vehicles, there is a pressing need of improving the system's scalability, safety and reducing the engineering cost. A realistic, scalable, and practical simulator of the driving world is highly desired. In this paper, we present an efficient solution based on generative models which learns the dynamics of the driving scenes. With this model, we can not only simulate the diverse futures of a given driving scenario but also generate a variety of driving scenarios conditioned on various prompts. Our innovative design allows the model to operate in both full-Autoregressive and partial-Autoregressive modes, significantly improving inference and training speed without sacrificing generative capability. This efficiency makes it ideal for being used as an online reactive environment for reinforcement learning, an evaluator for planning policies, and a high-fidelity simulator for testing. We evaluated our model against two real-world datasets: the Waymo motion dataset and the nuPlan dataset. On the simulation realism and scene generation benchmark, our model achieves the state-of-the-art performance. And in the planning benchmarks, our planner outperforms the prior arts. We conclude that the proposed generative model may serve as a foundation for a variety of motion planning tasks, including data generation, simulation, planning, and online training. Source code is public at https://github.com/HorizonRobotics/GUMP/
