Latent Uncertainty Representations for Video-based Driver Action and Intention Recognition
Koen Vellenga, H. Joe Steinhauer, Jonas Andersson, Anders Sjögren
TL;DR
This work addresses the need for reliable uncertainty estimation in video-based driver action and intention recognition under resource constraints. It introduces Latent Uncertainty Representations (LUR) and a repulsively trained variant (RLUR), which add transformation layers to latent features to obtain multiple uncertainty-aware predictions without altering the core architecture. Across four datasets, LUR demonstrates competitive in-distribution accuracy and strong uncertainty-based OOD detection, often matching top last-layer probabilistic methods while being more efficient. The findings suggest that latent-space transformations offer a practical, scalable route to robust uncertainty estimation in real-time driving applications, with potential extensions to deeper transformations and alternative uncertainty measures.
Abstract
Deep neural networks (DNNs) are increasingly applied to safety-critical tasks in resource-constrained environments, such as video-based driver action and intention recognition. While last layer probabilistic deep learning (LL-PDL) methods can detect out-of-distribution (OOD) instances, their performance varies. As an alternative to last layer approaches, we propose extending pre-trained DNNs with transformation layers to produce multiple latent representations to estimate the uncertainty. We evaluate our latent uncertainty representation (LUR) and repulsively trained LUR (RLUR) approaches against eight PDL methods across four video-based driver action and intention recognition datasets, comparing classification performance, calibration, and uncertainty-based OOD detection. We also contribute 28,000 frame-level action labels and 1,194 video-level intention labels for the NuScenes dataset. Our results show that LUR and RLUR achieve comparable in-distribution classification performance to other LL-PDL approaches. For uncertainty-based OOD detection, LUR matches top-performing PDL methods while being more efficient to train and easier to tune than approaches that require Markov-Chain Monte Carlo sampling or repulsive training procedures.
