Table of Contents
Fetching ...

DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving

Haisheng Su, Wei Wu, Junchi Yan

TL;DR

End-to-end autonomous driving models often rely on dense scene representations, which can hinder efficiency and realism. This work introduces DiFSD, an ego-centric fully sparse paradigm that combines sparse perception, ego-centric hierarchical interaction, and iterative motion planning with two-level uncertainty denoising to improve planning safety and speed. Key innovations include intention-guided geometric attention to select CIPV/CIPS, joint motion prediction for interactive agents and ego, and stage-wise end-to-end training with perception, interaction, motion, and planning losses. Empirical results on nuScenes and Bench2Drive show substantial reductions in L2 error and collision rate, along with major speedups, demonstrating the practicality and scalability of fully sparse ego-centric end-to-end driving.

Abstract

Current end-to-end autonomous driving methods resort to unifying modular designs for various tasks (e.g. perception, prediction and planning). Although optimized in a planning-oriented spirit with a fully differentiable framework, existing end-to-end driving systems without ego-centric designs still suffer from unsatisfactory performance and inferior efficiency, owing to the rasterized scene representation learning and redundant information transmission. In this paper, we revisit the human driving behavior and propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. The sparse perception module performs detection, tracking and online mapping based on sparse representation of the driving scene. The hierarchical interaction module aims to select the Closest In-Path Vehicle / Stationary (CIPV / CIPS) from coarse to fine, benefiting from an additional geometric prior. As for the iterative motion planner, both selected interactive agents and ego-vehicle are considered for joint motion prediction, where the output multi-modal ego-trajectories are optimized in an iterative fashion. Besides, both position-level motion diffusion and trajectory-level planning denoising are introduced for uncertainty modeling, thus facilitating the training stability and convergence of the whole framework. Extensive experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.

DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving

TL;DR

End-to-end autonomous driving models often rely on dense scene representations, which can hinder efficiency and realism. This work introduces DiFSD, an ego-centric fully sparse paradigm that combines sparse perception, ego-centric hierarchical interaction, and iterative motion planning with two-level uncertainty denoising to improve planning safety and speed. Key innovations include intention-guided geometric attention to select CIPV/CIPS, joint motion prediction for interactive agents and ego, and stage-wise end-to-end training with perception, interaction, motion, and planning losses. Empirical results on nuScenes and Bench2Drive show substantial reductions in L2 error and collision rate, along with major speedups, demonstrating the practicality and scalability of fully sparse ego-centric end-to-end driving.

Abstract

Current end-to-end autonomous driving methods resort to unifying modular designs for various tasks (e.g. perception, prediction and planning). Although optimized in a planning-oriented spirit with a fully differentiable framework, existing end-to-end driving systems without ego-centric designs still suffer from unsatisfactory performance and inferior efficiency, owing to the rasterized scene representation learning and redundant information transmission. In this paper, we revisit the human driving behavior and propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. The sparse perception module performs detection, tracking and online mapping based on sparse representation of the driving scene. The hierarchical interaction module aims to select the Closest In-Path Vehicle / Stationary (CIPV / CIPS) from coarse to fine, benefiting from an additional geometric prior. As for the iterative motion planner, both selected interactive agents and ego-vehicle are considered for joint motion prediction, where the output multi-modal ego-trajectories are optimized in an iterative fashion. Besides, both position-level motion diffusion and trajectory-level planning denoising are introduced for uncertainty modeling, thus facilitating the training stability and convergence of the whole framework. Extensive experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
Paper Structure (25 sections, 4 equations, 12 figures, 11 tables)

This paper contains 25 sections, 4 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: The comparison of different end-to-end paradigms. (a) The dense BEV-Centric paradigm. (b) The sparse Query-Centric paradigm. (c) The proposed fully sparse Ego-Centric paradigm.
  • Figure 2: Overview of our proposed framework. DiFSD first extracts multi-scale image features from multi-view images using an off-the-shelf visual encoder, then perceives both dynamic and static elements in a sparse manner. The Ego-Env hierarchical interaction module is presented to select the interactive queries from coarse to fine using three different driving commands of ego queries, which are leveraged for joint motion planner through iterative refinement. An additional geometric prior is introduced for high-quality query ranking through intention-guided attention. Besides, both position-level agent diffusion and trajectory-level ego-vehicle denoising are conducted for uncertainty modeling of the end-to-end driving system.
  • Figure 3: Illustration of the dual interaction layer in the hierarchical interaction module and planning optimization layer in the motion planner module.
  • Figure 4: Details of the interactive score fusion process in the geometric attended selection.
  • Figure 5: Qualitative results of DiFSD. DiFSD outputs planning results based on hierarchical interaction and joint motion of sparse interactive agents without considering other irrelevant objects. We omit the map selection results for clarity of road structure details.
  • ...and 7 more figures