Table of Contents
Fetching ...

Active Inference in Discrete State Spaces from First Principles

Patrick Kenny

TL;DR

This paper separates active inference in discrete state spaces from the Free Energy Principle and shows that a unified, monotone objective can be obtained by minimizing a constrained KL divergence via standard mean-field variational methods. Perception and action are cast within a single framework using Hidden Markov Models and dynamic Bayesian networks, avoiding reliance on expected free energy and instead deriving predictive and posterior distributions from KL optimization. The work provides comprehensive treatment of learning with Dirichlet priors, policy beliefs, planning, and time-domain renormalization, and it discusses the relation and trade-offs with traditional expected free energy formulations. Overall, it argues that discrete active inference can be implemented with robust, conventional variational techniques, with clear implications for scalable perception, action, and learning in AI systems.

Abstract

We seek to clarify the concept of active inference by disentangling it from the Free Energy Principle. We show how the optimizations that need to be carried out in order to implement active inference in discrete state spaces can be formulated as constrained divergence minimization problems which can be solved by standard mean field methods that do not appeal to the idea of expected free energy. When it is used to model perception, the perception/action divergence criterion that we propose coincides with variational free energy. When it is used to model action, it differs from an expected free energy functional by an entropy regularizer.

Active Inference in Discrete State Spaces from First Principles

TL;DR

This paper separates active inference in discrete state spaces from the Free Energy Principle and shows that a unified, monotone objective can be obtained by minimizing a constrained KL divergence via standard mean-field variational methods. Perception and action are cast within a single framework using Hidden Markov Models and dynamic Bayesian networks, avoiding reliance on expected free energy and instead deriving predictive and posterior distributions from KL optimization. The work provides comprehensive treatment of learning with Dirichlet priors, policy beliefs, planning, and time-domain renormalization, and it discusses the relation and trade-offs with traditional expected free energy formulations. Overall, it argues that discrete active inference can be implemented with robust, conventional variational techniques, with clear implications for scalable perception, action, and learning in AI systems.

Abstract

We seek to clarify the concept of active inference by disentangling it from the Free Energy Principle. We show how the optimizations that need to be carried out in order to implement active inference in discrete state spaces can be formulated as constrained divergence minimization problems which can be solved by standard mean field methods that do not appeal to the idea of expected free energy. When it is used to model perception, the perception/action divergence criterion that we propose coincides with variational free energy. When it is used to model action, it differs from an expected free energy functional by an entropy regularizer.

Paper Structure

This paper contains 32 sections, 135 equations, 5 figures.

Figures (5)

  • Figure 1: A toy model of the brain as a Markov Random Field. Nodes correspond to grey matter and branches to white matter. Energy functions are associated with both nodes and branches. Energy functions associated with the nodes model the microstates of neural populations. Energy functions associated with the branches model the dependencies between the microstates of the neural populations associated with the nodes. One or more of the nodes receives sensory data from the external world which changes from moment to moment. The neural populations are continually striving to dissipate free energy by exciting or inhibiting the populations with which they are in communication via the branches.
  • Figure 2: A toy model for predictive processing in the visual cortex. Hatching indicates variables whose values are not observed.
  • Figure 3: A toy model for simultaneous predictive processing in multiple sensory modalities. The leaf nodes of the tree correspond to sensors.
  • Figure 4: Constructing a Dynamic Bayesian Network from a static network. The static network supports prediction from top to bottom (as in figs. \ref{['fig:PP']} and \ref{['fig:tree']}). The dynamic network supports prediction from past to future as well.
  • Figure 5: Directed graphical model for the Perception/Action cycle. At time $t$, $\mathbf o_{\le t}$ has been observed and $\mathbf o_{> t}$ is yet to be observed.