Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles
Rhys Howard, Nick Hawes, Lars Kunze
TL;DR
This work tackles explainability for vehicular agent interactions by learning a weighted reward profile and applying twin-world counterfactual inference within a structural causal model to causally attribute actions. The method integrates a one-shot inverse-reinforcement-learning-like reward learning with a modular SCM-based architecture to generate interpretable explanations of behavior and interactions. Quantitative results on highD, and qualitative results on exiD and inD demonstrate competitive performance and richer, more transparent explanations than prior reward-based approaches, while also highlighting limitations tied to the expressiveness of reward metrics and data availability. Overall, the approach advances transparent, causally grounded analysis of autonomous-vehicle decision-making in interaction-rich traffic scenarios, with practical implications for safety and accountability.
Abstract
Transparency and explainability are important features that responsible autonomous vehicles should possess, particularly when interacting with humans, and causal reasoning offers a strong basis to provide these qualities. However, even if one assumes agents act to maximise some concept of reward, it is difficult to make accurate causal inferences of agent planning without capturing what is of importance to the agent. Thus our work aims to learn a weighting of reward metrics for agents such that explanations for agent interactions can be causally inferred. We validate our approach quantitatively and qualitatively across three real-world driving datasets, demonstrating a functional improvement over previous methods and competitive performance across evaluation metrics.
