Exploring the Causality of End-to-End Autonomous Driving

Jiankun Li; Hao Li; Jiangjiang Liu; Zhikang Zou; Xiaoqing Ye; Fan Wang; Jizhou Huang; Hua Wu; Haifeng Wang

Exploring the Causality of End-to-End Autonomous Driving

Jiankun Li, Hao Li, Jiangjiang Liu, Zhikang Zou, Xiaoqing Ye, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

TL;DR

This work tackles the opacity of end-to-end autonomous driving by introducing DriveInsight, a causal debugging framework that combines counterfactual interventions with response visualizations to identify which factors most influence final actions. The authors present a modular architecture with modality-specific encoders, multi-modal fusion, and a planning decoder that predicts ego trajectories, and they validate the approach in a closed-loop CARLA environment against open-source baselines. Key contributions include a comprehensive qualitative and quantitative analysis of causal factors, prompt-editing and visual analysis methods, and a competitive end-to-end baseline that offers interpretability without hand-crafted post-processing. This framework advances interpretability and reliability in end-to-end driving, providing a practical toolkit for diagnosing causal factors and guiding robust model design in realistic simulation settings.

Abstract

Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, there is currently no systematic solution to help researchers debug and identify the key factors that lead to the final predicted action of end-to-end autonomous driving. In this work, we propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving. First, we validate the essential information that the final planning depends on by using controlled variables and counterfactual interventions for qualitative analysis. Then, we quantitatively assess the factors influencing model decisions by visualizing and statistically analyzing the response of key model inputs. Finally, based on the comprehensive study of the multi-factorial end-to-end autonomous driving system, we have developed a strong baseline and a tool for exploring causality in the close-loop simulator CARLA. It leverages the essential input sources to obtain a well-designed model, resulting in highly competitive capabilities. As far as we know, our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one. Thorough close-loop experiments demonstrate that our method can be applied to end-to-end autonomous driving solutions for causality debugging. Code will be available at https://github.com/bdvisl/DriveInsight.

Exploring the Causality of End-to-End Autonomous Driving

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 12 figures, 2 tables)

This paper contains 21 sections, 2 equations, 12 figures, 2 tables.

Introduction
Related Works
End-to-End Autonomous Driving
Causality of End-to-End Autonomous Driving
Method
Modality-Specific Encoders
Fusion Encoders
Planning Decoders
Experiments
Experiment Setup
Ablation Study and Analysis
Effectiveness of Prompts of Planning Decoder
Prompt Editing
Visual Analysis of Component Responses
Visual Analysis of Activation Map
...and 6 more sections

Figures (12)

Figure 1: Overall architecture of our DriveInsight framework. The LiDAR point cloud and multi-view images are processed separately in their respective encoders. Then the resultant features are then fed into the multi-modal fusion and temporal fusion modules sequentially to get the fused BEV features. Besides, we transform other sensor information, including traffic signs, target points, command, routing, and etc. into environmental token embeddings. Along with BEV tokens, we sent all these tokens into transformer decoder to predict future trajectories.
Figure 2: Visualizations of different simulation time steps.The last column shows the visualization of the point cloud and component information. The green curves represent routing, the red dots indicate the target point, the dark blue lines represent the vectorized map, and the light blue rectangles indicate obstacles. Simulation time steps $t_0$, $t_1$, and $t_2$ correspond to the three sampling moments in Figure \ref{['fig:prompt_edit_grad']} and Figure \ref{['fig:prompt_edit_weight']} (represented as blue, orange, and green in the figures), respectively.
Figure 3: Visualizations of the gradients w.r.t. different tokens by simulation time steps. The gradients in the x and y directions are represented by $G_x$ and $G_y$, respectively. The horizontal axis represents the time elapsed along the current route. We sampled three representative moments, denoted as $t_1$, $t_2$, and $t_3$, indicated in the graph by blue, orange, and green vertical lines, respectively.
Figure 4: Visualizations of the activation w.r.t. different attention heads of different simulation time steps. The three colors in the histogram correspond to the sampling time points in Figure 3. The red line represents the average response value of different components over the observation time interval.
Figure 5: Visualization of the activation map
...and 7 more figures

Exploring the Causality of End-to-End Autonomous Driving

TL;DR

Abstract

Exploring the Causality of End-to-End Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (12)