DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving
Haisheng Su, Wei Wu, Junchi Yan
TL;DR
End-to-end autonomous driving models often rely on dense scene representations, which can hinder efficiency and realism. This work introduces DiFSD, an ego-centric fully sparse paradigm that combines sparse perception, ego-centric hierarchical interaction, and iterative motion planning with two-level uncertainty denoising to improve planning safety and speed. Key innovations include intention-guided geometric attention to select CIPV/CIPS, joint motion prediction for interactive agents and ego, and stage-wise end-to-end training with perception, interaction, motion, and planning losses. Empirical results on nuScenes and Bench2Drive show substantial reductions in L2 error and collision rate, along with major speedups, demonstrating the practicality and scalability of fully sparse ego-centric end-to-end driving.
Abstract
Current end-to-end autonomous driving methods resort to unifying modular designs for various tasks (e.g. perception, prediction and planning). Although optimized in a planning-oriented spirit with a fully differentiable framework, existing end-to-end driving systems without ego-centric designs still suffer from unsatisfactory performance and inferior efficiency, owing to the rasterized scene representation learning and redundant information transmission. In this paper, we revisit the human driving behavior and propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. The sparse perception module performs detection, tracking and online mapping based on sparse representation of the driving scene. The hierarchical interaction module aims to select the Closest In-Path Vehicle / Stationary (CIPV / CIPS) from coarse to fine, benefiting from an additional geometric prior. As for the iterative motion planner, both selected interactive agents and ego-vehicle are considered for joint motion prediction, where the output multi-modal ego-trajectories are optimized in an iterative fashion. Besides, both position-level motion diffusion and trajectory-level planning denoising are introduced for uncertainty modeling, thus facilitating the training stability and convergence of the whole framework. Extensive experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
