On efficient computation in active inference
Aswin Paul, Noor Sajid, Lancelot Da Costa, Adeel Razi
TL;DR
This work addresses the computational bottlenecks of active inference in complex settings and the challenge of specifying target distributions. It introduces two complementary advances: a dynamic-programming planner for finite horizons that minimizes the expected free energy via backward induction, and a Z-learning–inspired method to online-learn prior preferences, enabling horizon-1 planning with learned goals. Dynamic programming reduces planning complexity to roughly $O(|S| \times |U| \times T)$ and matches model-based RL performance on grid-world tasks, while learned priors allow shallow planning to perform strongly in changing environments. Together, these approaches enhance the scalability and practicality of active inference for sequential decision problems, offering a pathway to efficient, biologically plausible control in real-world domains.
Abstract
Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.
