Table of Contents
Fetching ...

SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

Wenchao Sun, Xuewu Lin, Keyu Chen, Zixiang Pei, Xiang Li, Yining Shi, Sifa Zheng

Abstract

End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the action space. In this work, we start with a systematic scaling study of Hydra-MDP, a representative scoring-based method, revealing that performance consistently improves as trajectory anchors become denser, without exhibiting saturation before computational constraints are reached. Motivated by this observation, we propose SparseDriveV2 to push the performance boundary of scoring-based planning through two complementary innovations: (1) a scalable vocabulary representation with a factorized structure that decomposes trajectories into geometric paths and velocity profiles, enabling combinatorial coverage of the action space, and (2) a scalable scoring strategy with coarse factorized scoring over paths and velocity profiles followed by fine-grained scoring on a small set of composed trajectories. By combining these two techniques, SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, with 89.15 Driving Score and 70.00 Success Rate on Bench2Drive with a lightweight ResNet-34 as backbone. Code and model are released at https://github.com/swc-17/SparseDriveV2.

SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

Abstract

End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the action space. In this work, we start with a systematic scaling study of Hydra-MDP, a representative scoring-based method, revealing that performance consistently improves as trajectory anchors become denser, without exhibiting saturation before computational constraints are reached. Motivated by this observation, we propose SparseDriveV2 to push the performance boundary of scoring-based planning through two complementary innovations: (1) a scalable vocabulary representation with a factorized structure that decomposes trajectories into geometric paths and velocity profiles, enabling combinatorial coverage of the action space, and (2) a scalable scoring strategy with coarse factorized scoring over paths and velocity profiles followed by fine-grained scoring on a small set of composed trajectories. By combining these two techniques, SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, with 89.15 Driving Score and 70.00 Success Rate on Bench2Drive with a lightweight ResNet-34 as backbone. Code and model are released at https://github.com/swc-17/SparseDriveV2.

Paper Structure

This paper contains 25 sections, 33 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the SparseDriveV2 framework. SparseDriveV2 factorizes (a) a spatiotemporal trajectory into (b) a geometric path and a velocity profile, and reconstructs the trajectory by (c) composing the two components. This representation enables (d) a super-dense trajectory vocabulary constructed from a compact set of paths and velocity profiles. Conditioned on (e) scene features, the scalable scoring strategy first performs (f) coarse factorized scoring over paths and velocity profiles to select top-$k$ candidates, followed by (g) fine-grained scoring over the composed trajectories to produce the final planning decision.
  • Figure 2: SparseDriveV2 produces smoother trajectory than the baseline method in sharp-turning scenarios.
  • Figure 3: SparseDriveV2 achieves higher traffic efficiency, while the baseline method remains stationary.
  • Figure 4: SparseDriveV2 is better aligned with the expert trajectory in terms of high-level intent, enabled by geometric path modeling.
  • Figure 5: Failure cases: SparseDriveV2 generates trajectories with incorrect navigation decisions in certain scenarios, possibly due to insufficient navigation information.