Table of Contents
Fetching ...

Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles

Rhys Howard, Nick Hawes, Lars Kunze

TL;DR

This work tackles explainability for vehicular agent interactions by learning a weighted reward profile and applying twin-world counterfactual inference within a structural causal model to causally attribute actions. The method integrates a one-shot inverse-reinforcement-learning-like reward learning with a modular SCM-based architecture to generate interpretable explanations of behavior and interactions. Quantitative results on highD, and qualitative results on exiD and inD demonstrate competitive performance and richer, more transparent explanations than prior reward-based approaches, while also highlighting limitations tied to the expressiveness of reward metrics and data availability. Overall, the approach advances transparent, causally grounded analysis of autonomous-vehicle decision-making in interaction-rich traffic scenarios, with practical implications for safety and accountability.

Abstract

Transparency and explainability are important features that responsible autonomous vehicles should possess, particularly when interacting with humans, and causal reasoning offers a strong basis to provide these qualities. However, even if one assumes agents act to maximise some concept of reward, it is difficult to make accurate causal inferences of agent planning without capturing what is of importance to the agent. Thus our work aims to learn a weighting of reward metrics for agents such that explanations for agent interactions can be causally inferred. We validate our approach quantitatively and qualitatively across three real-world driving datasets, demonstrating a functional improvement over previous methods and competitive performance across evaluation metrics.

Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles

TL;DR

This work tackles explainability for vehicular agent interactions by learning a weighted reward profile and applying twin-world counterfactual inference within a structural causal model to causally attribute actions. The method integrates a one-shot inverse-reinforcement-learning-like reward learning with a modular SCM-based architecture to generate interpretable explanations of behavior and interactions. Quantitative results on highD, and qualitative results on exiD and inD demonstrate competitive performance and richer, more transparent explanations than prior reward-based approaches, while also highlighting limitations tied to the expressiveness of reward metrics and data availability. Overall, the approach advances transparent, causally grounded analysis of autonomous-vehicle decision-making in interaction-rich traffic scenarios, with practical implications for safety and accountability.

Abstract

Transparency and explainability are important features that responsible autonomous vehicles should possess, particularly when interacting with humans, and causal reasoning offers a strong basis to provide these qualities. However, even if one assumes agents act to maximise some concept of reward, it is difficult to make accurate causal inferences of agent planning without capturing what is of importance to the agent. Thus our work aims to learn a weighting of reward metrics for agents such that explanations for agent interactions can be causally inferred. We validate our approach quantitatively and qualitatively across three real-world driving datasets, demonstrating a functional improvement over previous methods and competitive performance across evaluation metrics.

Paper Structure

This paper contains 28 sections, 12 equations, 5 figures.

Figures (5)

  • Figure 1: Illustrations of the proposed method to generate causal explanations for vehicular agent behavioural interactions with learnt reward profiles.
  • Figure 2: SCM architecture of the causal autonomous system for vehicles.
  • Figure 3: Quantitative Results
  • Figure 4: Illustration of twin-world analysis of driving scenes. $\mathcal{W}$ denotes the planned behaviour under the original world state at the time the affected action $a_A$ was taken. Meanwhile $\mathcal{W}^{\neg C}$ denotes the planned behaviour under the counterfactual world state in which the causing action $a_C$ was not taken, at the same time as before. The magenta vehicle indicates the affected agent, the cyan vehicle the causing agent, and green vehicles the background agents.
  • Figure 5: Reward Profiles for Each Scenario