Table of Contents
Fetching ...

DAP: A Discrete-token Autoregressive Planner for Autonomous Driving

Bowen Ye, Bin Zhang, Hang Zhao

TL;DR

DAP tackles the sparse supervision and scalability challenge in autonomous driving planning by casting motion planning as discrete-token autoregression. It jointly forecasts BEV semantics and ego trajectory tokens using a decoder-only Transformer with sparse MoE, and it augments imitation learning with SAC-BC offline fine-tuning to improve safety and comfort while preserving a BC prior. The approach delivers state-of-the-art open-loop results and competitive NavSim closed-loop performance within a compact 160M-parameter model, thanks to dense spatiotemporal supervision and a lightweight post-tuning module. This work demonstrates that discrete-token autoregression, coupled world-modeling signals, and RL refinement can yield scalable, robust planning suitable for real-world autonomous driving deployments.

Abstract

Gaining sustainable performance improvement with scaling data and model budget remains a pivotal yet unresolved challenge in autonomous driving. While autoregressive models exhibited promising data-scaling efficiency in planning tasks, predicting ego trajectories alone suffers sparse supervision and weakly constrains how scene evolution should shape ego motion. Therefore, we introduce DAP, a discrete-token autoregressive planner that jointly forecasts BEV semantics and ego trajectories, thereby enforcing comprehensive representation learning and allowing predicted dynamics to directly condition ego motion. In addition, we incorporate a reinforcement-learning-based fine-tuning, which preserves supervised behavior cloning priors while injecting reward-guided improvements. Despite a compact 160M parameter budget, DAP achieves state-of-the-art performance on open-loop metrics and delivers competitive closed-loop results on the NAVSIM benchmark. Overall, the fully discrete-token autoregressive formulation operating on both rasterized BEV and ego actions provides a compact yet scalable planning paradigm for autonomous driving.

DAP: A Discrete-token Autoregressive Planner for Autonomous Driving

TL;DR

DAP tackles the sparse supervision and scalability challenge in autonomous driving planning by casting motion planning as discrete-token autoregression. It jointly forecasts BEV semantics and ego trajectory tokens using a decoder-only Transformer with sparse MoE, and it augments imitation learning with SAC-BC offline fine-tuning to improve safety and comfort while preserving a BC prior. The approach delivers state-of-the-art open-loop results and competitive NavSim closed-loop performance within a compact 160M-parameter model, thanks to dense spatiotemporal supervision and a lightweight post-tuning module. This work demonstrates that discrete-token autoregression, coupled world-modeling signals, and RL refinement can yield scalable, robust planning suitable for real-world autonomous driving deployments.

Abstract

Gaining sustainable performance improvement with scaling data and model budget remains a pivotal yet unresolved challenge in autonomous driving. While autoregressive models exhibited promising data-scaling efficiency in planning tasks, predicting ego trajectories alone suffers sparse supervision and weakly constrains how scene evolution should shape ego motion. Therefore, we introduce DAP, a discrete-token autoregressive planner that jointly forecasts BEV semantics and ego trajectories, thereby enforcing comprehensive representation learning and allowing predicted dynamics to directly condition ego motion. In addition, we incorporate a reinforcement-learning-based fine-tuning, which preserves supervised behavior cloning priors while injecting reward-guided improvements. Despite a compact 160M parameter budget, DAP achieves state-of-the-art performance on open-loop metrics and delivers competitive closed-loop results on the NAVSIM benchmark. Overall, the fully discrete-token autoregressive formulation operating on both rasterized BEV and ego actions provides a compact yet scalable planning paradigm for autonomous driving.

Paper Structure

This paper contains 25 sections, 19 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overall architecture of DAP. Historical multi-modal inputs are tokenized (VQVAE for BEV, $\kappa$--$a$ discretization for ego actions), then a decoder-only autoregressive Transformer with sparse MoE jointly predicts future BEV and ego trajectory tokens. The joint forecasting provides dense, time-aligned signals that couple scene evolution with motion generation.
  • Figure 2: The necessity of RL: for the sub-optimal trajectories 2 and 3, though their BC losses are nearly identical, the 3rd one would yield a collision and hence get higher SAC loss.
  • Figure 3: Two-stage training: (I) supervised pretraining with cross-entropy losses on trajectory and environment tokens, and (II) offline SAC-BC fine-tuning that augments the policy with reward-driven adaptation while retaining behavior consistency.
  • Figure 4: Qualitative comparison of jointly forecasted BEV semantics and ego trajectories (bottom rows) against ground truth (top rows).
  • Figure 5: Qualitative trajectory visualizations in representative scenarios, where red denotes the ground-truth trajectory and blue indicates the planned trajectory.
  • ...and 1 more figures