Explainable deep learning improves human mental models of self-driving cars
Eoin M. Kenny, Akshay Dharmavaram, Sang Uk Lee, Tung Phan-Minh, Shreyas Rajesh, Yunqing Hu, Laura Major, Momchil S. Tomov, Julie A. Shah
TL;DR
This work tackles the opacity of deep neural planners in autonomous driving by introducing the Concept-Wrapper Network (CW-Net), which grounds planner reasoning in human-interpretable concepts to produce causally faithful explanations. CW-Net preserves driving performance while enabling concept-based explanations that improve human drivers' mental models and situational awareness in both real-world and simulated settings. Across semi-naturalistic on-road tests and large online SAGAT studies, explanations improved prediction and understanding in surprising scenarios without harming routine performance, suggesting practical utility and potential regulatory relevance. The approach leverages a DriveIRL-based black-box planner augmented with a concept classifier and a new reward module, trained on large-scale driving data, with extensive evaluation demonstrating robustness, generalization, and accessibility for reproducibility.
Abstract
Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging for the human behind the wheel to accurately anticipate when they will fail, with potentially catastrophic consequences. While research into interpreting these systems has surged, most of it is confined to simulations or toy setups due to the difficulty of real-world deployment, leaving the practical utility of such techniques unknown. Here, we introduce the Concept-Wrapper Network (CW-Net), a method for explaining the behavior of machine-learning-based planners by grounding their reasoning in human-interpretable concepts. We deploy CW-Net on a real self-driving car and show that the resulting explanations improve the human driver's mental model of the car, allowing them to better predict its behavior. To our knowledge, this is the first demonstration that explainable deep learning integrated into self-driving cars can be both understandable and useful in a realistic deployment setting. CW-Net accomplishes this level of intelligibility while providing explanations which are causally faithful and do not sacrifice driving performance. Overall, our study establishes a general pathway to interpretability for autonomous agents by way of concept-based explanations, which could help make them more transparent and safe.
