Collision Probability Distribution Estimation via Temporal Difference Learning
Thomas Steinecker, Thorsten Luettel, Mirko Maehlisch
TL;DR
CollisionPro presents a temporal-difference learning framework to estimate the full time-resolved distribution of collision probability rather than a single risk value. It defines a vector of cumulative probabilities $p_{t \rightarrow t+i}$ for $i=1,\dots,N_H$, trained with TD$(\lambda)$ updates $\mathcal{G}^{\lambda}_{t \rightarrow t+i}$ and a multi-head neural network that bootstraps across horizons. The loss enforces probabilistic validity and monotonicity, while handling rare events with prioritized sampling; evaluation in CARLA shows high sample efficiency, with reliable predictions for unseen collisions using fewer than $10^3$ collisions observed. The approach contributes to explainable AI by interpreting the outputs as time-resolved risk measures and can be integrated as a safety supervisor or as a component of RL agents. Source code is publicly available.
Abstract
We introduce CollisionPro, a pioneering framework designed to estimate cumulative collision probability distributions using temporal difference learning, specifically tailored to applications in robotics, with a particular emphasis on autonomous driving. This approach addresses the demand for explainable artificial intelligence (XAI) and seeks to overcome limitations imposed by model-based approaches and conservative constraints. We formulate our framework within the context of reinforcement learning to pave the way for safety-aware agents. Nevertheless, we assert that our approach could prove beneficial in various contexts, including a safety alert system or analytical purposes. A comprehensive examination of our framework is conducted using a realistic autonomous driving simulator, illustrating its high sample efficiency and reliable prediction capabilities for previously unseen collision events. The source code is publicly available.
