Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, Glen Berseth
TL;DR
The paper addresses the limitation that fixed entropy-based intrinsic motivations (surprise-minimization or surprise-maximization) fail to generalize across environments. It introduces a Surprise-Adaptive Bandit (S-Adapt) that online selects between these objectives by measuring the agent's ability to control environmental entropy, using an intrinsic feedback signal grounded in entropy dynamics. Experiments show that S-Adapt can replicate the favorable behaviors of each single-objective agent in appropriate regimes, produce diverse emergent behaviours in benchmarks, and achieve competitive or superior task rewards without extrinsic supervision. The approach provides a versatile framework for unsupervised reinforcement learning that adapts to the entropy landscape of the environment, with potential implications for scalable pretraining and continual learning.
Abstract
Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes and can learn skillful behaviors in benchmark tasks. Videos of the trained agents and summarized findings can be found on our project page https://sites.google.com/view/surprise-adaptive-agents
