Table of Contents
Fetching ...

Novelty Detection in Reinforcement Learning with World Models

Geigh Zollicoffer, Kenneth Eaton, Jonathan Balloch, Julia Kim, Wei Zhou, Robert Wright, Mark O. Riedl

TL;DR

This work addresses the challenge of detecting unanticipated, persistent changes (novelties) in reinforcement learning when using world models. It introduces a threshold-free novelty detector based on a KL-bound derived from Bayesian surprise that compares predicted latent states to actual observations within the DreamerV2 framework. Across MiniGrid, Atari, and DeepMind Control domains, the KL-bound method demonstrates strong false-positive control, competitive or superior detection delay (ADE), and high discriminability (AUC) relative to RL-focused baselines like RIQN, while offering substantial real-time speedups. The results suggest that latent-space bounds from world-models can provide robust, scalable novelty detection suitable for safe deployment in dynamic environments, with applicability to alternative architecture families like Transformers and diffusion-based models.

Abstract

Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.

Novelty Detection in Reinforcement Learning with World Models

TL;DR

This work addresses the challenge of detecting unanticipated, persistent changes (novelties) in reinforcement learning when using world models. It introduces a threshold-free novelty detector based on a KL-bound derived from Bayesian surprise that compares predicted latent states to actual observations within the DreamerV2 framework. Across MiniGrid, Atari, and DeepMind Control domains, the KL-bound method demonstrates strong false-positive control, competitive or superior detection delay (ADE), and high discriminability (AUC) relative to RL-focused baselines like RIQN, while offering substantial real-time speedups. The results suggest that latent-space bounds from world-models can provide robust, scalable novelty detection suitable for safe deployment in dynamic environments, with applicability to alternative architecture families like Transformers and diffusion-based models.

Abstract

Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.
Paper Structure (35 sections, 2 theorems, 23 equations, 7 figures, 9 tables)

This paper contains 35 sections, 2 theorems, 23 equations, 7 figures, 9 tables.

Key Result

Proposition 3.1

If the cross entropy score comparison becomes negative when introducing the vector $x_t$, then the right side of (eqn:bound) will become negative, which immediately flags $x_t$ as novelty due to the property of non-negativity of the left side KL divergence.

Figures (7)

  • Figure 1: Visualization of the average levels of divergence given $x_t$ samples from the nominal MiniGrid minigrid environment as training progresses: proposed bound (Blue); divergence of the RSSM predicted distributions given ($h_t$) and ($h_t$, $x_t$) (Orange); divergence between the RSSM given ($h_t$, $x_t$) and the RSSM given only the image ($h_0$, $x_t$) (Green line) in the Nominal environment; and the divergence of the RSSM given ($h_t$, $x_t$) and receiving zero input ($h_0$) (Red) as training progresses. For a training time-step $t$, a given $x_t$ is said to be a normal observation if the value of $KL[p_{\phi}(z_t|h_t,x_t)||p_{\phi}(z_t|h_t)]$ is below blue (within the green shaded area), otherwise $x_t$ is said to be a novel observation given the current context. See Appendix \ref{['Appendix:Divergences']} for similar visualizations corresponding to other environments.
  • Figure 2: Minigrid full render of a simple FakeGoal environment. Here we empirically observe when the agent detects novelty in an task that has already been completed by disabling the goal. For this ablation experiment, we tentatively define ground truth novel transitions as transitions that interact with the fake goal. The most common transition initially flagged is from left to right. We suspect that this is due to the observation (light gray) that there appears to be nothing on the other side of an open door.
  • Figure 3: From left to right, $\hat{x_t}_{prior}, \hat{x_t}_{posterior}$ and $x_t$ observations of agent trying to open door in the BrokenDoor custom minigrid environment where the door fails to open despite having the correct key. The intuition of PP-Mare is to distinguish high reconstruction loss between samples.
  • Figure 4: Reconstruction error alongside the three tested thresholds: Random Model, Trained Model and Combination model. Reconstruction error from the nominal MiniGrid-DoorKey-6x6 environment (Left), and Reconstruction error novel LavaGap environment (Right). Each vertical line refers to the cut off value for the corresponding threshold. Samples to the right of the line are classified as novelty. Reconstruction Error was generated from a trained agent. It appears that no threshold (even tuned) would separate the space.
  • Figure 5: Divergences during training (Note the scale of the y axis)
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • Proposition 3.2