Table of Contents
Fetching ...

SafeDrive: Fine-Grained Safety Reasoning for End-to-End Driving in a Sparse World

Jungho Kim, Jiyong Oh, Seunghoon Yu, Hongjae Shin, Donghyuk Kwak, Jun Won Choi

TL;DR

This work proposes SafeDrive, an E2E planning framework designed to perform explicit and interpretable safety reasoning through a trajectory-conditioned Sparse World Model, which achieves state-of-the-art performance on both open-loop and closed-loop benchmarks.

Abstract

The end-to-end (E2E) paradigm, which maps sensor inputs directly to driving decisions, has recently attracted significant attention due to its unified modeling capability and scalability. However, ensuring safety in this unified framework remains one of the most critical challenges. In this work, we propose SafeDrive, an E2E planning framework designed to perform explicit and interpretable safety reasoning through a trajectory-conditioned Sparse World Model. SafeDrive comprises two complementary networks: the Sparse World Network (SWNet) and the Fine-grained Reasoning Network (FRNet). SWNet constructs trajectory-conditioned sparse worlds that simulate the future behaviors of critical dynamic agents and road entities, providing interaction-centric representations for downstream reasoning. FRNet then evaluates agent-specific collision risks and temporal adherence to drivable regions, enabling precise identification of safety-critical events across future timesteps. SafeDrive achieves state-of-the-art performance on both open-loop and closed-loop benchmarks. On NAVSIM, it records a PDMS of 91.6 and an EPDMS of 87.5, with only 61 collisions out of 12,146 scenarios (0.5%). On Bench2Drive, SafeDrive attains a 66.8% driving score.

SafeDrive: Fine-Grained Safety Reasoning for End-to-End Driving in a Sparse World

TL;DR

This work proposes SafeDrive, an E2E planning framework designed to perform explicit and interpretable safety reasoning through a trajectory-conditioned Sparse World Model, which achieves state-of-the-art performance on both open-loop and closed-loop benchmarks.

Abstract

The end-to-end (E2E) paradigm, which maps sensor inputs directly to driving decisions, has recently attracted significant attention due to its unified modeling capability and scalability. However, ensuring safety in this unified framework remains one of the most critical challenges. In this work, we propose SafeDrive, an E2E planning framework designed to perform explicit and interpretable safety reasoning through a trajectory-conditioned Sparse World Model. SafeDrive comprises two complementary networks: the Sparse World Network (SWNet) and the Fine-grained Reasoning Network (FRNet). SWNet constructs trajectory-conditioned sparse worlds that simulate the future behaviors of critical dynamic agents and road entities, providing interaction-centric representations for downstream reasoning. FRNet then evaluates agent-specific collision risks and temporal adherence to drivable regions, enabling precise identification of safety-critical events across future timesteps. SafeDrive achieves state-of-the-art performance on both open-loop and closed-loop benchmarks. On NAVSIM, it records a PDMS of 91.6 and an EPDMS of 87.5, with only 61 collisions out of 12,146 scenarios (0.5%). On Bench2Drive, SafeDrive attains a 66.8% driving score.
Paper Structure (43 sections, 12 equations, 9 figures, 12 tables)

This paper contains 43 sections, 12 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Comparison of end-to-end planning paradigms and the SafeDrive framework. (a) Dense world models provide limited modeling of instance-centric interactions, whereas sparse world models capture them effectively. (b) Scene-level safety evaluation is coarse, while fine-grained evaluation identifies the specific agents and timestamps associated with potential risks. (c) SafeDrive leverages a sparse world model and fine-grained safety reasoning to generate safe trajectories.
  • Figure 2: Overall architecture of SafeDrive. ProposalNet evaluates the scene-level safety of anchor trajectories using BEV features and selects safety-aware candidates. SWNet constructs trajectory-conditioned Sparse Worlds by simulating the future behaviors of dynamic agents and road entities. FRNet performs fine-grained safety reasoning by estimating pair-wise No at-fault Collision score and evaluating Time-wise Drivable Area Compliance score over time, enabling interpretable and temporally grounded safety assessment.
  • Figure 3: Qualitative comparison with other SOTA models on the NAVSIM test set. The expert trajectory (GT) is shown in purple, the predicted trajectories of each model are shown in blue, and the log-replayed future trajectories of surrounding agents are shown in black.
  • Figure 4: Visualization of the PwNC-based reasoning process. (a) Current scene images. (b) Predicted PwNC values are used to color-shade the logged future positions of surrounding vehicles. (c) NC-score visualization of all candidate trajectories for GT, ProposalNet, and FRNet. Green for surrounding-vehicle boxes and candidate trajectories indicates higher safety, while red denotes elevated risk.
  • Figure 5: Visualization of reasoning process. The figure compares two forms of fine-grained safety reasoning, PwNC in (a) and TwDAC in (b). The bottom-left panel visualizes predicted fine-grained safety scores across Sparse Worlds using red–green shading, with (a) visualizing PwNC scores for the future boxes of surrounding agents and (b) visualizing TwDAC scores for the future ego boxes. The bottom-right panel presents the corresponding trajectory-level scores, showing NC scores for (a) and DAC scores for (b), each generated by ProposalNet, FRNet, and the ground truth.
  • ...and 4 more figures