Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks
Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker
TL;DR
This work tackles explainable scene understanding for automated driving by combining Qualitative Explainable Graphs (QXGs) with Graph Neural Networks (GNNs). It introduces a novel GNN architecture that processes entire QXGs to identify relevant objects, addressing the limitations of prior approaches that relied on single relation chains. The method uses a specialized ROI formulation, a carefully crafted QXG feature set, and a balanced loss function to handle severe class imbalance, achieving superior performance over traditional baselines on nuScenes with DriveLM relevance labels. The results demonstrate that qualitative representations can be effectively integrated with deep learning to produce explainable, context-aware driving decisions, with potential extensions to action explanation and cross-dataset robustness.
Abstract
This paper investigates the integration of graph neural networks (GNNs) with Qualitative Explainable Graphs (QXGs) for scene understanding in automated driving. Scene understanding is the basis for any further reactive or proactive decision-making. Scene understanding and related reasoning is inherently an explanation task: why is another traffic participant doing something, what or who caused their actions? While previous work demonstrated QXGs' effectiveness using shallow machine learning models, these approaches were limited to analysing single relation chains between object pairs, disregarding the broader scene context. We propose a novel GNN architecture that processes entire graph structures to identify relevant objects in traffic scenes. We evaluate our method on the nuScenes dataset enriched with DriveLM's human-annotated relevance labels. Experimental results show that our GNN-based approach achieves superior performance compared to baseline methods. The model effectively handles the inherent class imbalance in relevant object identification tasks while considering the complete spatial-temporal relationships between all objects in the scene. Our work demonstrates the potential of combining qualitative representations with deep learning approaches for explainable scene understanding in autonomous driving systems.
