Table of Contents
Fetching ...

CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving

Enhui Ma, Lijun Zhou, Tao Tang, Jiahuan Zhang, Junpeng Jiang, Zhan Zhang, Dong Han, Kun Zhan, Xueyang Zhang, XianPeng Lang, Haiyang Sun, Xia Zhou, Di Lin, Kaicheng Yu

TL;DR

This work tackles robustness gaps in end-to-end autonomous driving caused by long-tail failure cases. It presents CorrectAD, a self-correcting agentic system that couples PM-Agent for failure analysis with DriveSora for BEV-conditioned diffusion-based video generation to augment training data and strengthen planners. Demonstrated across nuScenes and a large in-house dataset, CorrectAD yields substantial improvements in L2 error and collision rates while maintaining model-agnostic applicability to different E2E planners. The approach shows that targeted, multimodal data generation guided by failure analysis can provide sustainable, scalable improvements to end-to-end planning in autonomous driving.

Abstract

End-to-end planning methods are the de facto standard of the current autonomous driving system, while the robustness of the data-driven approaches suffers due to the notorious long-tail problem (i.e., rare but safety-critical failure cases). In this work, we explore whether recent diffusion-based video generation methods (a.k.a. world models), paired with structured 3D layouts, can enable a fully automated pipeline to self-correct such failure cases. We first introduce an agent to simulate the role of product manager, dubbed PM-Agent, which formulates data requirements to collect data similar to the failure cases. Then, we use a generative model that can simulate both data collection and annotation. However, existing generative models struggle to generate high-fidelity data conditioned on 3D layouts. To address this, we propose DriveSora, which can generate spatiotemporally consistent videos aligned with the 3D annotations requested by PM-Agent. We integrate these components into our self-correcting agentic system, CorrectAD. Importantly, our pipeline is an end-to-end model-agnostic and can be applied to improve any end-to-end planner. Evaluated on both nuScenes and a more challenging in-house dataset across multiple end-to-end planners, CorrectAD corrects 62.5% and 49.8% of failure cases, reducing collision rates by 39% and 27%, respectively.

CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving

TL;DR

This work tackles robustness gaps in end-to-end autonomous driving caused by long-tail failure cases. It presents CorrectAD, a self-correcting agentic system that couples PM-Agent for failure analysis with DriveSora for BEV-conditioned diffusion-based video generation to augment training data and strengthen planners. Demonstrated across nuScenes and a large in-house dataset, CorrectAD yields substantial improvements in L2 error and collision rates while maintaining model-agnostic applicability to different E2E planners. The approach shows that targeted, multimodal data generation guided by failure analysis can provide sustainable, scalable improvements to end-to-end planning in autonomous driving.

Abstract

End-to-end planning methods are the de facto standard of the current autonomous driving system, while the robustness of the data-driven approaches suffers due to the notorious long-tail problem (i.e., rare but safety-critical failure cases). In this work, we explore whether recent diffusion-based video generation methods (a.k.a. world models), paired with structured 3D layouts, can enable a fully automated pipeline to self-correct such failure cases. We first introduce an agent to simulate the role of product manager, dubbed PM-Agent, which formulates data requirements to collect data similar to the failure cases. Then, we use a generative model that can simulate both data collection and annotation. However, existing generative models struggle to generate high-fidelity data conditioned on 3D layouts. To address this, we propose DriveSora, which can generate spatiotemporally consistent videos aligned with the 3D annotations requested by PM-Agent. We integrate these components into our self-correcting agentic system, CorrectAD. Importantly, our pipeline is an end-to-end model-agnostic and can be applied to improve any end-to-end planner. Evaluated on both nuScenes and a more challenging in-house dataset across multiple end-to-end planners, CorrectAD corrects 62.5% and 49.8% of failure cases, reducing collision rates by 39% and 27%, respectively.

Paper Structure

This paper contains 22 sections, 12 equations, 21 figures, 11 tables.

Figures (21)

  • Figure 1: (a): The workflow of one model iteration consists of 4 steps: finding failure cases, preparing training data, model updating, followed by evaluation and iteration again. The key issue is how to prepare specific training data to correct the failure cases.(b): Previous paradigm was retrieval-based, i.e., retrieving similar data from the existing dataset and auto-labeling them, which severely limits the diversity of training data. (c): Our proposed agentic system, CorrectAD, is custom-generated. We first propose PM-Agent, similar to the role of Product Manager, to formulate data requirements by analyzing failure cases. Then, we propose a generative model DriveSora, similar to the role of Data Department, to generate high-fidelity training data aligned with the data requirements requested by PM-Agent. Our approach outperforms previous methods in L2 and collision rate (Col.) for end-to-end planning models.
  • Figure 2: The framework of PM-Agent. Given a failure case $\hat{x}^{\text{fail}}$, PM-Agent first classifies the failure causes to $h^\text{class}$, then analyzes the failure description $h^\text{desc}$ in detail. Based on $h^\text{desc}$, PM-Agent generates specific requirements $q$. Then PM-Agent formulates multimodal requirements $\hat{r}$ (including bird's-eye-view layouts and scene captions) similar to the failure case to interact with the later generative model.
  • Figure 3: The framework of DriveSora, which performs data generation tasks, aiming to produce high-quality, diverse new data.
  • Figure 4: Visualization of two examples before and after self-correction on nuScenes validation set.(a) We show two hard examples from the validation set, "a low-visibility night", "bypass in dense traffic flow". (b) Our framework can fix these examples.
  • Figure 5: Visualization of two examples before and after self-correction on our in-house validation set. Results are rendered via a proprietary closed-loop simulator based on Gaussian splatting.
  • ...and 16 more figures