Table of Contents
Fetching ...

Information-Theoretic Policy Pre-Training with Empowerment

Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Michael Volpp, Joschka Boedecker

TL;DR

The paper addresses the need for data-efficient pre-training in reinforcement learning by proposing empowerment as a general, task-agnostic pre-training signal. It extends empowerment with a discounted, multi-horizon formulation, enabling a simple pre-training objective that maximizes discounted empowerment $\mathcal{E}_{\lambda}$ to initialize policies that adapt quickly to downstream tasks. Empirical results in gridworlds show that empowerment-based pre-training improves data efficiency across multiple RL algorithms, with capacity-maximizing pre-training often outperforming capacity-achieving variants. The approach demonstrates robustness in both deterministic and stochastic environments and suggests a promising path toward scalable, unsupervised pre-training for large-scale RL agents, while highlighting computational challenges for estimating empowerment at scale. Overall, empowerment serves as a principled, environment-centric initialization that can yield faster adaptation and stronger performance in downstream RL tasks.

Abstract

Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.

Information-Theoretic Policy Pre-Training with Empowerment

TL;DR

The paper addresses the need for data-efficient pre-training in reinforcement learning by proposing empowerment as a general, task-agnostic pre-training signal. It extends empowerment with a discounted, multi-horizon formulation, enabling a simple pre-training objective that maximizes discounted empowerment to initialize policies that adapt quickly to downstream tasks. Empirical results in gridworlds show that empowerment-based pre-training improves data efficiency across multiple RL algorithms, with capacity-maximizing pre-training often outperforming capacity-achieving variants. The approach demonstrates robustness in both deterministic and stochastic environments and suggests a promising path toward scalable, unsupervised pre-training for large-scale RL agents, while highlighting computational challenges for estimating empowerment at scale. Overall, empowerment serves as a principled, environment-centric initialization that can yield faster adaptation and stronger performance in downstream RL tasks.

Abstract

Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.

Paper Structure

This paper contains 20 sections, 8 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Empowerment-based Pre-Training. The untrained agent is pre-trained to optimize empowerment $\mathcal{E}$ in an environment-centric manner, which allows it to learn a policy that can be fine-tuned for specific tasks. The initialization achieved by this pre-training is expected to be closer to the optimal policy than a random initialization, as it has already learned to achieve options that are helpful for downstream tasks. The empowerment value is solely based on the environment characteristics without further human input whereas the extrinsic fine-tuning reward is based on expert human knowledge of the task.
  • Figure 2: Empowerment-values of our deterministic gridworld environment. The empowerment values are calculated for $1$-, $3$-, $5$ and $32$-steps and for the discounted case (from left to right). The $32$-steps empowerment grid shows a mostly uniform empowerment landscape due to the issue that most other states can be reached from any state in a horizon of $32$ steps. The last image shows the reward map for an exemplary goal state of the environment.
  • Figure 3: REINFORCE training curves for fine-tuning on individual goal states in a deterministic grid-world environment. Left panel: Both capacity-achieving and capacity-maximizing policies outperform the baseline in terms of data efficiency, with the capacity-maximizing agent performing best. Right panel: Comparison of the performance of agents pre-trained with $n$-step empowerment with different empowerment horizons and using our proposed discounted empowerment reward. Discounted empowerment performs favorably, without the need to tune the horizon length.
  • Figure 4: The gridworld results demonstrate that empowerment-based agents consistently outperform agents trained from scratch, with the most significant improvement observed in the case of REINFORCE and Actor-Critic. In contrast, the observed differences for PPO and DQN are comparatively minor. We presume that while empowerment contributes to variance reduction during learning, its impact diminishes in algorithms such as PPO and DQN, which already incorporate effective variance reduction mechanisms.
  • Figure 5: REINFORCE training curves in a stochastic gridworld, demonstrating the effectiveness of empowerment-based pretraining also for complex stochastic environments.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Remark