What Did I Learn? Operational Competence Assessment for AI-Based Trajectory Planners
Michiel Braat, Maren Buermann, Marijke van Weperen, Jan-Pieter Paardekooper
TL;DR
The paper tackles the challenge of ensuring trustworthy AI for automated driving by estimating when an AI-based trajectory planner is operating in contexts it has not been adequately trained for. It introduces a knowledge-graph based framework to describe driving data, compute a scene completeness metric (coverage) and a scene difficulty metric (complexity), and combines them into a competence score: $Competence(s) = Coverage(s) \cdot (1 - Complexity(s))$. Using the NuPlan dataset, it constructs scene graphs, defines sub-scene patterns, and evaluates how coverage and complexity relate to planner performance, finding a meaningful but not strong alignment between competence and trajectory quality. The approach yields insight into dataset composition and provides a practical, explainable way to gauge operational risk in ML-driven AD systems, with potential to guide data curation and model deployment. Overall, the work contributes a principled, KG-based method to monitor competence, describe driving data, and anticipate when a trajectory planner’s output is trustworthy for a given context.
Abstract
Automated driving functions increasingly rely on machine learning for tasks like perception and trajectory planning, requiring large, relevant datasets. The performance of these algorithms depends on how closely the training data matches the task. To ensure reliable functioning, it is crucial to know what is included in the dataset to assess the trained model's operational risk. We aim to enhance the safe use of machine learning in automated driving by developing a method to recognize situations that an automated vehicle has not been sufficiently trained on. This method also improves explainability by describing the dataset at a human-understandable level. We propose modeling driving data as knowledge graphs, representing driving scenes with entities and their relationships. These graphs are queried for specific sub-scene configurations to check their occurrence in the dataset. We estimate a vehicle's competence in a driving scene by considering the coverage and complexity of sub-scene configurations in the training set. Higher complexity scenes require greater coverage for high competence. We apply this method to the NuPlan dataset, modeling it with knowledge graphs and analyzing the coverage of specific driving scenes. This approach helps monitor the competence of machine learning models trained on the dataset, which is essential for trustworthy AI to be deployed in automated driving.
