Table of Contents
Fetching ...

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

Ziying Song, Lin Liu, Hongyu Pan, Bencheng Liao, Mingzhe Guo, Lei Yang, Yongchang Zhang, Shaoqing Xu, Caiyan Jia, Yadan Luo

TL;DR

DIVER tackles the lack of action diversity in end-to-end autonomous driving by integrating reinforcement learning with diffusion-based trajectory generation. A Policy-Aware Diffusion Generator (PADG) conditions on maps, agents, and reference trajectories to produce diverse, feasible multi-mode trajectories, while GRPO-based rewards enforce safety and diversity. The approach outperforms state-of-the-art methods on Bench2Drive, NAVSIM, and nuScenes, demonstrating stronger diversity (Div.) and safety (PDMS, NC, DAC) across closed- and open-loop benchmarks and under robustness tests. This hybrid IL-RL framework advances practical, robust planning in complex driving, with a dedicated Diversity Metric to quantify multi-modal behavior.

Abstract

Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

TL;DR

DIVER tackles the lack of action diversity in end-to-end autonomous driving by integrating reinforcement learning with diffusion-based trajectory generation. A Policy-Aware Diffusion Generator (PADG) conditions on maps, agents, and reference trajectories to produce diverse, feasible multi-mode trajectories, while GRPO-based rewards enforce safety and diversity. The approach outperforms state-of-the-art methods on Bench2Drive, NAVSIM, and nuScenes, demonstrating stronger diversity (Div.) and safety (PDMS, NC, DAC) across closed- and open-loop benchmarks and under robustness tests. This hybrid IL-RL framework advances practical, robust planning in complex driving, with a dedicated Diversity Metric to quantify multi-modal behavior.

Abstract

Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

Paper Structure

This paper contains 32 sections, 22 equations, 8 figures, 16 tables, 2 algorithms.

Figures (8)

  • Figure 1: (a) Imitation-based Single-Mode Trajectory Planninguniadjiang2023vadTransFuserxu2024m2daST_P3 predicts deterministic trajectories but lacks action diversity, leading to potential safety risks. (b) Imitation-based Multi-Mode Trajectory Planningsun2024sparsedrivechen2024vadv2liao2024diffusiondrive fails to address the diversity loss in imitation learning end-to-end autonomous driving, leading to mode collapse. The generated multi-mode trajectories overly depend on a single GT trajectory, ultimately clustering around it. (c) The proposed DIVER framework adopts reinforced diffusion for multi-mode trajectory generation, encouraging the ego-vehicle to produce diverse driving behaviors instead of rigidly following a single expert.
  • Figure 2: Imitation learning-based multi-mode trajectory paradigm. Most IL-based multi-mode E2E-AD methods rely on L1 loss for training and L2 distance for evaluation, which emphasizes matching a single GT trajectory rather than modeling diversity. This misalignment limits the generation of truly diverse behaviors. Even with diffusion-based frameworks liao2024diffusiondrive, such imitation-driven objectives constrain their capacity to capture multi-mode driving patterns.
  • Figure 3: The overall architecture of DIVER. As a multi-mode trajectory E2E-AD framework, DIVER first encodes multi-view images into feature maps to extract scene representations through a perception module. It then predicts the motion of surrounding agents and performs planning via a conditional diffusion model guided by reinforcement learning to generate diverse multi-mode trajectories. Our approach effectively addresses the inherent mode collapse in imitation learning, enabling the generation of safe and diverse behaviors for complex driving scenarios.
  • Figure 4: The illustration of Policy-Aware Diffusion Generator. By incorporating the predicted trajectory, and GT trajectory as inputs, PADG reconstructs diverse multi-mode trajectories from noise through a conditional denoising process, guided by map and agent context.
  • Figure 5: Impact of the Number of Reference GTs on Closed-Loop Performance (Bench2Drive). A value of 0 indicates no reference GTs (only the GT trajectory is used). Values of 1 and above correspond to using one reference GTs in addition to the GT, and so on.
  • ...and 3 more figures