Table of Contents
Fetching ...

Latent Uncertainty Representations for Video-based Driver Action and Intention Recognition

Koen Vellenga, H. Joe Steinhauer, Jonas Andersson, Anders Sjögren

TL;DR

This work addresses the need for reliable uncertainty estimation in video-based driver action and intention recognition under resource constraints. It introduces Latent Uncertainty Representations (LUR) and a repulsively trained variant (RLUR), which add transformation layers to latent features to obtain multiple uncertainty-aware predictions without altering the core architecture. Across four datasets, LUR demonstrates competitive in-distribution accuracy and strong uncertainty-based OOD detection, often matching top last-layer probabilistic methods while being more efficient. The findings suggest that latent-space transformations offer a practical, scalable route to robust uncertainty estimation in real-time driving applications, with potential extensions to deeper transformations and alternative uncertainty measures.

Abstract

Deep neural networks (DNNs) are increasingly applied to safety-critical tasks in resource-constrained environments, such as video-based driver action and intention recognition. While last layer probabilistic deep learning (LL-PDL) methods can detect out-of-distribution (OOD) instances, their performance varies. As an alternative to last layer approaches, we propose extending pre-trained DNNs with transformation layers to produce multiple latent representations to estimate the uncertainty. We evaluate our latent uncertainty representation (LUR) and repulsively trained LUR (RLUR) approaches against eight PDL methods across four video-based driver action and intention recognition datasets, comparing classification performance, calibration, and uncertainty-based OOD detection. We also contribute 28,000 frame-level action labels and 1,194 video-level intention labels for the NuScenes dataset. Our results show that LUR and RLUR achieve comparable in-distribution classification performance to other LL-PDL approaches. For uncertainty-based OOD detection, LUR matches top-performing PDL methods while being more efficient to train and easier to tune than approaches that require Markov-Chain Monte Carlo sampling or repulsive training procedures.

Latent Uncertainty Representations for Video-based Driver Action and Intention Recognition

TL;DR

This work addresses the need for reliable uncertainty estimation in video-based driver action and intention recognition under resource constraints. It introduces Latent Uncertainty Representations (LUR) and a repulsively trained variant (RLUR), which add transformation layers to latent features to obtain multiple uncertainty-aware predictions without altering the core architecture. Across four datasets, LUR demonstrates competitive in-distribution accuracy and strong uncertainty-based OOD detection, often matching top last-layer probabilistic methods while being more efficient. The findings suggest that latent-space transformations offer a practical, scalable route to robust uncertainty estimation in real-time driving applications, with potential extensions to deeper transformations and alternative uncertainty measures.

Abstract

Deep neural networks (DNNs) are increasingly applied to safety-critical tasks in resource-constrained environments, such as video-based driver action and intention recognition. While last layer probabilistic deep learning (LL-PDL) methods can detect out-of-distribution (OOD) instances, their performance varies. As an alternative to last layer approaches, we propose extending pre-trained DNNs with transformation layers to produce multiple latent representations to estimate the uncertainty. We evaluate our latent uncertainty representation (LUR) and repulsively trained LUR (RLUR) approaches against eight PDL methods across four video-based driver action and intention recognition datasets, comparing classification performance, calibration, and uncertainty-based OOD detection. We also contribute 28,000 frame-level action labels and 1,194 video-level intention labels for the NuScenes dataset. Our results show that LUR and RLUR achieve comparable in-distribution classification performance to other LL-PDL approaches. For uncertainty-based OOD detection, LUR matches top-performing PDL methods while being more efficient to train and easier to tune than approaches that require Markov-Chain Monte Carlo sampling or repulsive training procedures.

Paper Structure

This paper contains 35 sections, 2 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Schematic overview of the deterministic single inference latent uncertainty representation. The raw input data $x$ is encoded into a latent representation $z$. This representation is then processed through the transformation layer(s) to produce the additional latent representation(s) $z^{(i)}_{trans}$. Afterward, both $z$ and $z^{(i)}_{trans}$ are processed through the same classification layer $\theta_{LL}$, which produces $\hat{y}$ and $\hat{y_i}$.
  • Figure 2: Comparison between last layer and LUR approaches in terms of in-distribution F1-score (top row) and OOD min PR-AUC performance (bottom row) across different transformation layer types. Whiskers indicate two standard errors of the mean computed over five random seeds.
  • Figure 3: In-distribution classification performance (top row) and OOD min detection performance (bottom row) for different kernel types and numbers of transformation layers in the RLUR approach for a single random seed.
  • Figure 4: Urban lane change examples.
  • Figure 5: Schematic overview of a driving scene denoting the inconsistency in available frames, and the annotations.
  • ...and 3 more figures