Table of Contents
Fetching ...

Curiosity is Knowledge: Self-Consistent Learning and No-Regret Optimization with Active Inference

Yingke Li, Anjali Parashar, Enlu Zhou, Chuchu Fan

TL;DR

The paper addresses how to robustly balance exploration and exploitation in sequential decision-making by using active inference and minimizing the Expected Free Energy. It shows that a sufficient curiosity level, captured by a lower bound on the curiosity coefficient β_t, yields both posterior consistency in learning and no-regret optimization, unifying Bayesian Optimization and Bayesian Experimental Design within a single framework. Theoretical guarantees are provided via two theorems: posterior consistency and a GP-based no-regret bound, with practical guidelines for adaptive curiosity scheduling and energy function design. The results are validated through synthetic experiments and real-world Hybrid learning–optimization tasks, highlighting the practical impact for robotics, adaptive experimentation, and complex design problems.

Abstract

Active inference (AIF) unifies exploration and exploitation by minimizing the Expected Free Energy (EFE), balancing epistemic value (information gain) and pragmatic value (task performance) through a curiosity coefficient. Yet it has been unclear when this balance yields both coherent learning and efficient decision-making: insufficient curiosity can drive myopic exploitation and prevent uncertainty resolution, while excessive curiosity can induce unnecessary exploration and regret. We establish the first theoretical guarantee for EFE-minimizing agents, showing that a single requirement--sufficient curiosity--simultaneously ensures self-consistent learning (Bayesian posterior consistency) and no-regret optimization (bounded cumulative regret). Our analysis characterizes how this mechanism depends on initial uncertainty, identifiability, and objective alignment, thereby connecting AIF to classical Bayesian experimental design and Bayesian optimization within one theoretical framework. We further translate these theories into practical design guidelines for tuning the epistemic-pragmatic trade-off in hybrid learning-optimization problems, validated through real-world experiments.

Curiosity is Knowledge: Self-Consistent Learning and No-Regret Optimization with Active Inference

TL;DR

The paper addresses how to robustly balance exploration and exploitation in sequential decision-making by using active inference and minimizing the Expected Free Energy. It shows that a sufficient curiosity level, captured by a lower bound on the curiosity coefficient β_t, yields both posterior consistency in learning and no-regret optimization, unifying Bayesian Optimization and Bayesian Experimental Design within a single framework. Theoretical guarantees are provided via two theorems: posterior consistency and a GP-based no-regret bound, with practical guidelines for adaptive curiosity scheduling and energy function design. The results are validated through synthetic experiments and real-world Hybrid learning–optimization tasks, highlighting the practical impact for robotics, adaptive experimentation, and complex design problems.

Abstract

Active inference (AIF) unifies exploration and exploitation by minimizing the Expected Free Energy (EFE), balancing epistemic value (information gain) and pragmatic value (task performance) through a curiosity coefficient. Yet it has been unclear when this balance yields both coherent learning and efficient decision-making: insufficient curiosity can drive myopic exploitation and prevent uncertainty resolution, while excessive curiosity can induce unnecessary exploration and regret. We establish the first theoretical guarantee for EFE-minimizing agents, showing that a single requirement--sufficient curiosity--simultaneously ensures self-consistent learning (Bayesian posterior consistency) and no-regret optimization (bounded cumulative regret). Our analysis characterizes how this mechanism depends on initial uncertainty, identifiability, and objective alignment, thereby connecting AIF to classical Bayesian experimental design and Bayesian optimization within one theoretical framework. We further translate these theories into practical design guidelines for tuning the epistemic-pragmatic trade-off in hybrid learning-optimization problems, validated through real-world experiments.
Paper Structure (32 sections, 7 theorems, 58 equations, 4 figures, 2 tables)

This paper contains 32 sections, 7 theorems, 58 equations, 4 figures, 2 tables.

Key Result

Theorem 5.1

Let $s$ be a discrete latent parameter of the model with parameter space $\mathcal{S}$, and $s^{\ast} \in \mathcal{S}$ denote the true (data-generating) parameter. At each iteration $t$, the query $x_{t} \in \mathcal{X}$ is chosen according to the AIF policy: where $I(s; (x,y) \mid \mathcal{D}_{t-1})$ is the conditional mutual information between $s$ and the next observation pair $(x,y)$, and $h_

Figures (4)

  • Figure 1: Discrete sandbox to validate Theorem \ref{['thm:self-consistency']}. Error bars represent $\pm 0.2$ std over 5 seeds.
  • Figure 2: 1D GP bandit to validate Theorem \ref{['thm:URB']}. Error bars represent $\pm 0.2$ std over 5 seeds.
  • Figure 3: Constrained system identification on environmental monitoring in 2d plume fields. Error bars represent $\pm 0.2$ std over 5 seeds.
  • Figure 4: Composite BO on distributed energy resource allocation in power grids. Error bars represent $\pm 0.2$ std over 5 seeds.

Theorems & Definitions (16)

  • Definition 4.1: Posterior Consistency
  • Definition 4.2: Regret Function
  • Definition 4.3: Potential Energy Function
  • Theorem 5.1: Posterior Consistency in AIF
  • proof
  • Theorem 6.1: Cumulative Regret Bound in AIF
  • proof
  • proof
  • Lemma 2.1: Lemma 5.3 in Srinivas2009GaussianDesign
  • Lemma 2.2: Lemma 5.5 in Srinivas2009GaussianDesign
  • ...and 6 more