Table of Contents
Fetching ...

CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention

Jiacheng Tang, Zhiyuan Zhou, Zhuolin He, Jia Zhang, Kai Zhang, Jian Pu

Abstract

Planning-oriented end-to-end driving models show great promise, yet they fundamentally learn statistical correlations instead of true causal relationships. This vulnerability leads to causal confusion, where models exploit dataset biases as shortcuts, critically harming their reliability and safety in complex scenarios. To address this, we introduce CausalVAD, a de-confounding training framework that leverages causal intervention. At its core, we design the sparse causal intervention scheme (SCIS), a lightweight, plug-and-play module to instantiate the backdoor adjustment theory in neural networks. SCIS constructs a dictionary of prototypes representing latent driving contexts. It then uses this dictionary to intervene on the model's sparse vectorized queries. This step actively eliminates spurious associations induced by confounders, thereby eliminating spurious factors from the representations for downstream tasks. Extensive experiments on benchmarks like nuScenes show CausalVAD achieves state-of-the-art planning accuracy and safety. Furthermore, our method demonstrates superior robustness against both data bias and noisy scenarios configured to induce causal confusion.

CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention

Abstract

Planning-oriented end-to-end driving models show great promise, yet they fundamentally learn statistical correlations instead of true causal relationships. This vulnerability leads to causal confusion, where models exploit dataset biases as shortcuts, critically harming their reliability and safety in complex scenarios. To address this, we introduce CausalVAD, a de-confounding training framework that leverages causal intervention. At its core, we design the sparse causal intervention scheme (SCIS), a lightweight, plug-and-play module to instantiate the backdoor adjustment theory in neural networks. SCIS constructs a dictionary of prototypes representing latent driving contexts. It then uses this dictionary to intervene on the model's sparse vectorized queries. This step actively eliminates spurious associations induced by confounders, thereby eliminating spurious factors from the representations for downstream tasks. Extensive experiments on benchmarks like nuScenes show CausalVAD achieves state-of-the-art planning accuracy and safety. Furthermore, our method demonstrates superior robustness against both data bias and noisy scenarios configured to induce causal confusion.
Paper Structure (42 sections, 14 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 42 sections, 14 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: The problem of spurious correlation. (Left) Standard end-to-end models learn the observational correlation $P(Y|S)$, making them vulnerable to latent confounders $Z$ (e.g., scene context) that create a spurious backdoor path ($S \leftarrow Z \rightarrow Y$). (Middle) VLM-based approaches suffer from the same confounding and introduce hallucinations. (Right) Our CausalVAD performs a causal intervention $P(Y|\text{do}(S))$ via backdoor adjustment, severing the spurious link to learn the true causal effect for robust and trustworthy decision-making.
  • Figure 2: The overall architecture of CausalVAD. Our method performs precise, multi-stage causal interventions at critical information hubs within the VAD pipeline. (1) Perception stage (bottom left): The perception de-confounding module (PDM) operates on the classification logits ($Y_o, Y_m$). A dual-branch structure adjusts the direct classification score $L$ against a bias score derived from the confounder dictionaries ($\{\mathcal{Z}_o\}, \{\mathcal{Z}_m\}$), outputting de-confounded logits. (2) Prediction and planning stages (bottom right): The interaction de-confounding module (IDM) removes spurious factors from queries before fusion in downstream tasks. IDM utilizes cross-attention to estimate the spurious component predictable from the context. This component is scaled by a gating unit and subtracted from the original query to block spurious associations.
  • Figure 3: The structural causal model (SCM) of VAD. The sub-figure (a) illustrates the causal flow between key representations. The sub-figures (b)-(d) on the right highlights several key confounders $Z$ that introduce backdoor paths.
  • Figure 4: The backdoor adjustment pearl2009causality principle. A confounder $Z$ opens a spurious backdoor path $S \leftarrow Z \rightarrow Y$. Applying the do-operator, i.e., $P(Y|\text{do}(S))$, severs this path, isolating the pure causal effect $S \rightarrow Y$.
  • Figure 5: T-SNE visualization of the final ego-query embeddings from the nuScenes validation set. (Left) The baseline VAD-tiny's representation space is entangled. (Right) Our CausalVAD successfully disentangles different navigation intents (straight, left, right) into clearly separable clusters, demonstrating its ability to mitigate dataset bias.
  • ...and 2 more figures