Table of Contents
Fetching ...

An Information-Geometric Approach to Artificial Curiosity

Alexander Nedergaard, Pablo A. Morales

TL;DR

The paper addresses exploration in sparse-reward reinforcement learning by formulating intrinsic rewards within an information-geometric framework. It proves that invariant intrinsic rewards must be concave functions of the reciprocal occupancy, and shows that a single curvature parameter $\alpha$ in the occupancy manifold yields a principled trade-off between exploration and exploitation via $\alpha$-information rewards. Special cases $\alpha=0$ and $\alpha=-1$ recover count-based and maximum-entropy exploration, respectively, unifying these approaches through occupancy-space geometry. The work further derives how natural gradients in occupancy space can guide learning, discusses practical occupancy estimation, and links novelty and surprise as a continuum of $\alpha$-information, with broad implications for algorithm design and neuroscience-inspired intuition.

Abstract

Learning in environments with sparse rewards remains a fundamental challenge in reinforcement learning. Artificial curiosity addresses this limitation through intrinsic rewards to guide exploration, however, the precise formulation of these rewards has remained elusive. Ideally, such rewards should depend on the agent's information about the environment, remaining agnostic to the representation of the information -- an invariance central to information geometry. Leveraging information geometry, we show that invariance under congruent Markov morphisms and the agent-environment interaction, uniquely constrains intrinsic rewards to concave functions of the reciprocal occupancy. Additional geometrically motivated restrictions effectively limits the candidates to those determined by a real parameter that governs the occupancy space geometry. Remarkably, special values of this parameter are found to correspond to count-based and maximum entropy exploration, revealing a geometric exploration-exploitation trade-off. This framework provides important constraints to the engineering of intrinsic reward while integrating foundational exploration methods into a single, cohesive model.

An Information-Geometric Approach to Artificial Curiosity

TL;DR

The paper addresses exploration in sparse-reward reinforcement learning by formulating intrinsic rewards within an information-geometric framework. It proves that invariant intrinsic rewards must be concave functions of the reciprocal occupancy, and shows that a single curvature parameter in the occupancy manifold yields a principled trade-off between exploration and exploitation via -information rewards. Special cases and recover count-based and maximum-entropy exploration, respectively, unifying these approaches through occupancy-space geometry. The work further derives how natural gradients in occupancy space can guide learning, discusses practical occupancy estimation, and links novelty and surprise as a continuum of -information, with broad implications for algorithm design and neuroscience-inspired intuition.

Abstract

Learning in environments with sparse rewards remains a fundamental challenge in reinforcement learning. Artificial curiosity addresses this limitation through intrinsic rewards to guide exploration, however, the precise formulation of these rewards has remained elusive. Ideally, such rewards should depend on the agent's information about the environment, remaining agnostic to the representation of the information -- an invariance central to information geometry. Leveraging information geometry, we show that invariance under congruent Markov morphisms and the agent-environment interaction, uniquely constrains intrinsic rewards to concave functions of the reciprocal occupancy. Additional geometrically motivated restrictions effectively limits the candidates to those determined by a real parameter that governs the occupancy space geometry. Remarkably, special values of this parameter are found to correspond to count-based and maximum entropy exploration, revealing a geometric exploration-exploitation trade-off. This framework provides important constraints to the engineering of intrinsic reward while integrating foundational exploration methods into a single, cohesive model.

Paper Structure

This paper contains 17 sections, 23 theorems, 107 equations, 1 figure.

Key Result

Theorem 2.1

The Markov chain formed by the time-inhomogeneous Markov kernel with $n\in\mathbb{Z}_+$, converges to a unique probability measure that is uniquely invariant under $M_t$.

Figures (1)

  • Figure 1: Artificial curiosity with $\alpha$-information rewards on the curved occupancy manifold. (Top). The Amari-Čencov tensor constant $\alpha\in\mathbb{R}$ encodes the occupancy manifold curvature (red--spherical, blue--flat, green--hyperbolic). Count-based exploration corresponds to the Riemannian geometry with $\alpha=0$, and maximum entropy exploration to the flat geometry with $\alpha=-1$ (Theorem \ref{['theorem:equivalence_count_based']}). (Bottom) The intrinsic rewards scaling $\beta\in\mathbb{R}_{\geq 0}$ controls the exploration-exploitation trade-off on the curved occupancy manifold. The optima are $\alpha$-projections, along $(-\alpha)$-geodesics $\gamma^{(i)}$, from the uniform occupancy $u$ onto isoreturn hyperplanes $\mathcal{H}_{\alpha,\beta_i}$ (Theorem \ref{['theorem:optima_projection']}). The exploration-exploitation trade-off is $(\alpha+2)$-geodesic $\beta$-interpolation between the maximally rewarding occupancy $p_\alpha^*$ and the uniform occupancy $u$ (Theorem \ref{['theorem:optima_geodesic']}).

Theorems & Definitions (34)

  • Theorem 2.1
  • Lemma 2.1
  • Theorem 2.2
  • Proposition 2.2
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Lemma 4.0
  • Theorem 4.1
  • Lemma 4.1
  • ...and 24 more