Table of Contents
Fetching ...

On efficient computation in active inference

Aswin Paul, Noor Sajid, Lancelot Da Costa, Adeel Razi

TL;DR

This work addresses the computational bottlenecks of active inference in complex settings and the challenge of specifying target distributions. It introduces two complementary advances: a dynamic-programming planner for finite horizons that minimizes the expected free energy via backward induction, and a Z-learning–inspired method to online-learn prior preferences, enabling horizon-1 planning with learned goals. Dynamic programming reduces planning complexity to roughly $O(|S| \times |U| \times T)$ and matches model-based RL performance on grid-world tasks, while learned priors allow shallow planning to perform strongly in changing environments. Together, these approaches enhance the scalability and practicality of active inference for sequential decision problems, offering a pathway to efficient, biologically plausible control in real-world domains.

Abstract

Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.

On efficient computation in active inference

TL;DR

This work addresses the computational bottlenecks of active inference in complex settings and the challenge of specifying target distributions. It introduces two complementary advances: a dynamic-programming planner for finite horizons that minimizes the expected free energy via backward induction, and a Z-learning–inspired method to online-learn prior preferences, enabling horizon-1 planning with learned goals. Dynamic programming reduces planning complexity to roughly and matches model-based RL performance on grid-world tasks, while learned priors allow shallow planning to perform strongly in changing environments. Together, these approaches enhance the scalability and practicality of active inference for sequential decision problems, offering a pathway to efficient, biologically plausible control in real-world domains.

Abstract

Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.
Paper Structure (27 sections, 46 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 27 sections, 46 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: Graphics to compare and contrast the differences between the sophisticated inference and DPEFE (Dynamic programming in expected free energy) algorithm planning schemes. A: Sophisticated inference algorithm uses an extensive tree search, going forward in time, to accumulate free energy of the future paths. So, an agent's preference for observations, when matched with future predictions, will inform an optimal state-action trajectory, as shown in the tree search. Light-purple states represent the preferred observations at that given time step, and light-blue actions are the optimal actions inferred through the tree search. As noted in Friston2021, an agent can significantly reduce the tree search complexity by terminating the search when the action probability falls below a certain threshold. However, this approximation does not guarantee optimal policy as the agent might miss preferred observations deeper in the tree search. B: In the DPEFE algorithm, an agent starts planning backwards from a fixed planning horizon. Here, the EFE of future states informs EFE of state-action pairs one step backward in time. Hence, the planning complexity of tree search is avoided, but the preference for future states propagates to influence decisions at previous time steps. Since the agent needs to evaluate only a table (of EFE) at every planning step, this planning algorithm is linear in time, number of states, and number of actions.
  • Figure 2: Informed and uninformed prior preferences: A: A navigation problem, B: A strictly defined, sparse prior preference which has information only about the final goal state, C: Informed prior preference necessary for 'pruning of tree search' in sophisticated inference (light colour states are more preferred)
  • Figure 3: A: A standard grid world of 100 states with 50 valid states. B: A grid of 400 states with 204 valid states. C: A grid of 900 states with 497 valid states. These three grids are used for evaluating the performance of various schemes.
  • Figure 4: The summary of agents' performance in the two grids. A: Deterministic grid (100 states), B: Stochastic version of the grid in A (100 states, partially observable, stochastic transitions (POMDP)).
  • Figure 5: The summary of agents' performance in the two grids. A: Stochastic grid (400 states, partially observable, stochastic transitions (POMDP)), B: Stochastic grid in A with goal-state randomized after every ten episodes.
  • ...and 3 more figures