On the Road to Clarity: Exploring Explainable AI for World Models in a Driver Assistance System
Mohamed Roshdi, Julian Petzold, Mostafa Wahby, Hussein Ebrahim, Mladen Berekovic, Heiko Hamann
TL;DR
The paper addresses explainability for deep world models used in autonomous driving, where safety-critical decisions demand transparency. It develops four XAI strategies applied to a ConvVAE–LSTM pedestrian-perception predictor trained on CARLA data: feature-map visualization, latent-space interpretation, a Renormalization Group (RG)-inspired interpretable autoencoder backbone, and LSTM dynamics with Layer-wise Relevance Propagation (LRP). Key contributions include a comprehensive visualization pipeline for ConvVAE internals, a latent-grid mapping that links latent changes to decoded features, an RG-based transparent backbone that projects inputs onto interpretable latent components, and an LSTM explainability framework validated against driver attention metrics (e.g., mean NSS $=0.53$). This work advances interpretable, safer driver assistance systems by providing scalable, end-to-end XAI tools for world-models in urban perception tasks, with clear paths toward certification and real-world deployment.
Abstract
In Autonomous Driving (AD) transparency and safety are paramount, as mistakes are costly. However, neural networks used in AD systems are generally considered black boxes. As a countermeasure, we have methods of explainable AI (XAI), such as feature relevance estimation and dimensionality reduction. Coarse graining techniques can also help reduce dimensionality and find interpretable global patterns. A specific coarse graining method is Renormalization Groups from statistical physics. It has previously been applied to Restricted Boltzmann Machines (RBMs) to interpret unsupervised learning. We refine this technique by building a transparent backbone model for convolutional variational autoencoders (VAE) that allows mapping latent values to input features and has performance comparable to trained black box VAEs. Moreover, we propose a custom feature map visualization technique to analyze the internal convolutional layers in the VAE to explain internal causes of poor reconstruction that may lead to dangerous traffic scenarios in AD applications. In a second key contribution, we propose explanation and evaluation techniques for the internal dynamics and feature relevance of prediction networks. We test a long short-term memory (LSTM) network in the computer vision domain to evaluate the predictability and in future applications potentially safety of prediction models. We showcase our methods by analyzing a VAE-LSTM world model that predicts pedestrian perception in an urban traffic situation.
