Table of Contents
Fetching ...

Exploring the Potential of World Models for Anomaly Detection in Autonomous Driving

Daniel Bogdoll, Lukas Bosch, Tim Joseph, Helen Gremmelmaier, Yitian Yang, J. Marius Zöllner

TL;DR

This work surveys how world models can be leveraged for anomaly detection in autonomous driving by framing anomalies as deviations from learned normality within a latent, action-conditioned predictive framework. It defines world models as latent-embedding, action-conditioned transition, and observation-decoding systems, and surveys embedding (VAE-based) and transition (MDN-RNN, RSSM, VRKN) architectures. The authors map anomaly-detection approaches—reconstructive, generative, predictive, confidence-based, and feature-based—onto world-model components, discuss data strategies (normative training data and purposely anomalous evaluation data), and outline an end-to-end inference workflow with rollout futures. The contribution is a unified perspective that connects corner-case taxonomy with a principled detection framework, enabling broader application of world models to autonomous driving safety and reliability.

Abstract

In recent years there have been remarkable advancements in autonomous driving. While autonomous vehicles demonstrate high performance in closed-set conditions, they encounter difficulties when confronted with unexpected situations. At the same time, world models emerged in the field of model-based reinforcement learning as a way to enable agents to predict the future depending on potential actions. This led to outstanding results in sparse reward and complex control tasks. This work provides an overview of how world models can be leveraged to perform anomaly detection in the domain of autonomous driving. We provide a characterization of world models and relate individual components to previous works in anomaly detection to facilitate further research in the field.

Exploring the Potential of World Models for Anomaly Detection in Autonomous Driving

TL;DR

This work surveys how world models can be leveraged for anomaly detection in autonomous driving by framing anomalies as deviations from learned normality within a latent, action-conditioned predictive framework. It defines world models as latent-embedding, action-conditioned transition, and observation-decoding systems, and surveys embedding (VAE-based) and transition (MDN-RNN, RSSM, VRKN) architectures. The authors map anomaly-detection approaches—reconstructive, generative, predictive, confidence-based, and feature-based—onto world-model components, discuss data strategies (normative training data and purposely anomalous evaluation data), and outline an end-to-end inference workflow with rollout futures. The contribution is a unified perspective that connects corner-case taxonomy with a principled detection framework, enabling broader application of world models to autonomous driving safety and reliability.

Abstract

In recent years there have been remarkable advancements in autonomous driving. While autonomous vehicles demonstrate high performance in closed-set conditions, they encounter difficulties when confronted with unexpected situations. At the same time, world models emerged in the field of model-based reinforcement learning as a way to enable agents to predict the future depending on potential actions. This led to outstanding results in sparse reward and complex control tasks. This work provides an overview of how world models can be leveraged to perform anomaly detection in the domain of autonomous driving. We provide a characterization of world models and relate individual components to previous works in anomaly detection to facilitate further research in the field.
Paper Structure (13 sections, 6 equations, 5 figures, 1 table)

This paper contains 13 sections, 6 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The bottom row shows a scene reconstruction of a world model huModelBasedImitationLearning2022 Compared to the ground truth, the model cannot recover all scene components, such as the bicyclist. This phenomenon can be exploited through a clear definition of normality and targeted training to detect anomalies.
  • Figure 2: A world model during inference, given high dimensional observations $o_t\ldots o_{t-i} \in \Omega$ and past and planned actions $a_{t-i}\ldots a_{t+j} \in A$ at time $t$. All state transitions up to $s_t \in S$ are computed with a representation model$p(s_t \mid s_{t-1},a_{t-1},o_t)$, where the observations are first being embedded. The embedding of actions is possible but optional. Future state transitions can be computed with the prediction model$p(s_t \mid s_{t-1},a_{t-1})$ based on the Markov assumption, where each state only depends on its predecessor. With the observation model$p(o_t \mid s_t)$, reconstructions $\hat{o}_t \in \Omega$ can be decoded from state $s_t$.
  • Figure 3: Interaction of an agent with an actor in an environment, where $A$ is a set of actions, $S$ is a set of states, and $\Omega$ is a set of observations. Given an observation $o$, the action $a$ of the agent results in a state transition $s \longrightarrow s'$ of the environment.
  • Figure 4: Planned actions in the context of autonomous driving. Based on a vehicle model, the planning module determines a finite list of actions for the ego vehicle in order to reach planned future vehicle states.
  • Figure 5: On the left, the last input frame for the prediction and its reconstruction are shown. The bottom row on the right shows the predictions of a world model conditioned on actions, where each action consists of acceleration and steering angle values as shown in the top row huModelBasedImitationLearning2022. Compared with the ground truth, the model is able to predict normal behavior. Under the hypothesis that it cannot predict atypical behavior unseen during training, differences between future observations and the predictions can be used for anomaly detection.