Deep Active Inference Agents for Delayed and Long-Horizon Environments
Yavar Taheri Yeganeh, Mohsen Jafari, Andrea Matta
TL;DR
This paper addresses the challenge of delayed, long-horizon control in active inference by introducing a Deep Active Inference (DAIF) agent that embeds a differentiable policy within a generative model. By overshooting latent dynamics to a long horizon and backpropagating the expected free energy (EFE) gradient through this horizon, the agent achieves planning without exhaustive search and supports continuous actions. The approach is trained via alternating updates on the variational free energy (VFE) for the world model and the EFE for the policy, using an experience replay buffer to stabilize learning. Empirically, DAIF demonstrates improved energy efficiency and robust long-horizon decision-making in a high-fidelity industrial control simulation, outperforming a model-free baseline and continuing to refine its predictive model even after the policy stabilizes. The work bridges neuroscience-inspired active inference with modern world-model reinforcement learning, offering a scalable framework for real-world, delayed-control tasks without hand-crafted rewards or complex planning heuristics.
Abstract
With the recent success of world-model agents, which extend the core idea of model-based reinforcement learning by learning a differentiable model for sample-efficient control across diverse tasks, active inference (AIF) offers a complementary, neuroscience-grounded paradigm that unifies perception, learning, and action within a single probabilistic framework powered by a generative model. Despite this promise, practical AIF agents still rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments requiring plans over long horizons, tens to hundreds of steps. Moreover, most existing agents are evaluated on robotic or vision benchmarks which, while natural for biological agents, fall short of real-world industrial complexity. We address these limitations with a generative-policy architecture featuring (i) a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead, (ii) an integrated policy network that enables the transition and receives gradients of the expected free energy, (iii) an alternating optimization scheme that updates model and policy from a replay buffer, and (iv) a single gradient step that plans over long horizons, eliminating exhaustive planning from the control loop. We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings. The empirical results confirm the effectiveness of the proposed approach, demonstrating the coupled world-model with the AIF formalism yields an end-to-end probabilistic controller capable of effective decision making in delayed, long-horizon settings without handcrafted rewards or expensive planning.
