Novelty Detection in Reinforcement Learning with World Models
Geigh Zollicoffer, Kenneth Eaton, Jonathan Balloch, Julia Kim, Wei Zhou, Robert Wright, Mark O. Riedl
TL;DR
This work addresses the challenge of detecting unanticipated, persistent changes (novelties) in reinforcement learning when using world models. It introduces a threshold-free novelty detector based on a KL-bound derived from Bayesian surprise that compares predicted latent states to actual observations within the DreamerV2 framework. Across MiniGrid, Atari, and DeepMind Control domains, the KL-bound method demonstrates strong false-positive control, competitive or superior detection delay (ADE), and high discriminability (AUC) relative to RL-focused baselines like RIQN, while offering substantial real-time speedups. The results suggest that latent-space bounds from world-models can provide robust, scalable novelty detection suitable for safe deployment in dynamic environments, with applicability to alternative architecture families like Transformers and diffusion-based models.
Abstract
Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.
