Table of Contents
Fetching ...

Collision Probability Distribution Estimation via Temporal Difference Learning

Thomas Steinecker, Thorsten Luettel, Mirko Maehlisch

TL;DR

CollisionPro presents a temporal-difference learning framework to estimate the full time-resolved distribution of collision probability rather than a single risk value. It defines a vector of cumulative probabilities $p_{t \rightarrow t+i}$ for $i=1,\dots,N_H$, trained with TD$(\lambda)$ updates $\mathcal{G}^{\lambda}_{t \rightarrow t+i}$ and a multi-head neural network that bootstraps across horizons. The loss enforces probabilistic validity and monotonicity, while handling rare events with prioritized sampling; evaluation in CARLA shows high sample efficiency, with reliable predictions for unseen collisions using fewer than $10^3$ collisions observed. The approach contributes to explainable AI by interpreting the outputs as time-resolved risk measures and can be integrated as a safety supervisor or as a component of RL agents. Source code is publicly available.

Abstract

We introduce CollisionPro, a pioneering framework designed to estimate cumulative collision probability distributions using temporal difference learning, specifically tailored to applications in robotics, with a particular emphasis on autonomous driving. This approach addresses the demand for explainable artificial intelligence (XAI) and seeks to overcome limitations imposed by model-based approaches and conservative constraints. We formulate our framework within the context of reinforcement learning to pave the way for safety-aware agents. Nevertheless, we assert that our approach could prove beneficial in various contexts, including a safety alert system or analytical purposes. A comprehensive examination of our framework is conducted using a realistic autonomous driving simulator, illustrating its high sample efficiency and reliable prediction capabilities for previously unseen collision events. The source code is publicly available.

Collision Probability Distribution Estimation via Temporal Difference Learning

TL;DR

CollisionPro presents a temporal-difference learning framework to estimate the full time-resolved distribution of collision probability rather than a single risk value. It defines a vector of cumulative probabilities for , trained with TD updates and a multi-head neural network that bootstraps across horizons. The loss enforces probabilistic validity and monotonicity, while handling rare events with prioritized sampling; evaluation in CARLA shows high sample efficiency, with reliable predictions for unseen collisions using fewer than collisions observed. The approach contributes to explainable AI by interpreting the outputs as time-resolved risk measures and can be integrated as a safety supervisor or as a component of RL agents. Source code is publicly available.

Abstract

We introduce CollisionPro, a pioneering framework designed to estimate cumulative collision probability distributions using temporal difference learning, specifically tailored to applications in robotics, with a particular emphasis on autonomous driving. This approach addresses the demand for explainable artificial intelligence (XAI) and seeks to overcome limitations imposed by model-based approaches and conservative constraints. We formulate our framework within the context of reinforcement learning to pave the way for safety-aware agents. Nevertheless, we assert that our approach could prove beneficial in various contexts, including a safety alert system or analytical purposes. A comprehensive examination of our framework is conducted using a realistic autonomous driving simulator, illustrating its high sample efficiency and reliable prediction capabilities for previously unseen collision events. The source code is publicly available.
Paper Structure (11 sections, 12 equations, 6 figures, 1 table)

This paper contains 11 sections, 12 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Visualization of the concept of CollisionPro. Given a scenario, a feature vector is extracted, that is passed to a deep neural network (DNN), that was trained via temporal difference (TD) learning. The output is a vector of cumulative collision probabilities up to the specified time horizon, e.g. $T_H = 5s$. As can be seen in the figure, this sophisticated risk assessment strategy provides stochastic information about the true collision probability ($\sim0.5$) and the time where the collision is most likely ($\SIrange{2}{4}{s}$). Furthermore, the learning process is shown (red arrows), which is based on the principle of temporal difference learning. While the first estimator ($p_{0 \rightarrow 1}$) learns purely from collision and non-collision events/signals ($r_{coll}$), subsequent estimators ($p_{0 \rightarrow 2}$ to $p_{0 \rightarrow 5}$) can learn from all previous estimators (bootstrapping). A fundamental aspect of our approach lies in treating probabilities as equivalent to the value function.
  • Figure 2: The relationship between long-term and short-term predictions (similar to lefevre2014survey). Whereas for short-term predictions the current kinematics and dynamics is crucial for risk assessment, intentions of all dynamic agents become increasingly important for long-term predictions.
  • Figure 3: The network architecture and pipeline for learning the cumulative collision probability distribution. Input: The input consists of a stacked observation space of the bird's-eye semantic view transformed to greyscale, resulting in an input dimension of $192 \times 192 \times 3$ considering three consecutive time steps. Due to hardware limitations, three time steps were chosen as this allows the dynamics of all entities to be captured, but further time steps into the past would presumably produce better predictions, as these could provide a richer representation of behavioral patterns and intentions. Architecture: The network consists of three parts: The encoder which consists of a sequence of convolutional neural network (CNN) layers (K $3 \times 3$ indicates the kernel matrix size) that are further processed by a fully connected block. The two aforementioned network components form the common backbone for the individual sub-networks that use the latent space from the backbone and the output from its predecessor head. The network is illustrated for $H=3$ heads/estimators. The initial SkipBlock within a sequence of SkipBlocks is denoted by the solid line in the bottom-right, tasked with transforming the inputs to the desired dimension. Subsequent blocks are represented by the dashed line.
  • Figure 4: Collision characteristics: The collision probability distribution in the last $T_{H}=2.0s$ before the event of a collision. Left: The mean collision characteristic over 50 collision scenarios. Center: An example scenario in which the collision was predicted at an early stage with a high degree of confidence. Right: An example scenario in which the collision was only predicted with confidence a short time beforehand (approximately $1$ second).
  • Figure 5: Error plots over $50$ epochs with $6 \cdot 10^3$ samples. Top: Mean error $\mathcal{E}_{\text{acc}}$ over all heads (see performance measure in \ref{['sec:performance']}). Center: Mean error $\mathcal{E}_{\text{pes}}$ over all heads of the performance measure. Bottom: Shows the accuracy performance $\mathcal{E}_{\text{acc}}$ for each individual head, where head $i$ corresponds to $p_{t \rightarrow t + i}$.
  • ...and 1 more figures