Information-Theoretic Policy Pre-Training with Empowerment
Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Michael Volpp, Joschka Boedecker
TL;DR
The paper addresses the need for data-efficient pre-training in reinforcement learning by proposing empowerment as a general, task-agnostic pre-training signal. It extends empowerment with a discounted, multi-horizon formulation, enabling a simple pre-training objective that maximizes discounted empowerment $\mathcal{E}_{\lambda}$ to initialize policies that adapt quickly to downstream tasks. Empirical results in gridworlds show that empowerment-based pre-training improves data efficiency across multiple RL algorithms, with capacity-maximizing pre-training often outperforming capacity-achieving variants. The approach demonstrates robustness in both deterministic and stochastic environments and suggests a promising path toward scalable, unsupervised pre-training for large-scale RL agents, while highlighting computational challenges for estimating empowerment at scale. Overall, empowerment serves as a principled, environment-centric initialization that can yield faster adaptation and stronger performance in downstream RL tasks.
Abstract
Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.
