Table of Contents
Fetching ...

Deep Active Inference Agents for Delayed and Long-Horizon Environments

Yavar Taheri Yeganeh, Mohsen Jafari, Andrea Matta

TL;DR

This paper addresses the challenge of delayed, long-horizon control in active inference by introducing a Deep Active Inference (DAIF) agent that embeds a differentiable policy within a generative model. By overshooting latent dynamics to a long horizon and backpropagating the expected free energy (EFE) gradient through this horizon, the agent achieves planning without exhaustive search and supports continuous actions. The approach is trained via alternating updates on the variational free energy (VFE) for the world model and the EFE for the policy, using an experience replay buffer to stabilize learning. Empirically, DAIF demonstrates improved energy efficiency and robust long-horizon decision-making in a high-fidelity industrial control simulation, outperforming a model-free baseline and continuing to refine its predictive model even after the policy stabilizes. The work bridges neuroscience-inspired active inference with modern world-model reinforcement learning, offering a scalable framework for real-world, delayed-control tasks without hand-crafted rewards or complex planning heuristics.

Abstract

With the recent success of world-model agents, which extend the core idea of model-based reinforcement learning by learning a differentiable model for sample-efficient control across diverse tasks, active inference (AIF) offers a complementary, neuroscience-grounded paradigm that unifies perception, learning, and action within a single probabilistic framework powered by a generative model. Despite this promise, practical AIF agents still rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments requiring plans over long horizons, tens to hundreds of steps. Moreover, most existing agents are evaluated on robotic or vision benchmarks which, while natural for biological agents, fall short of real-world industrial complexity. We address these limitations with a generative-policy architecture featuring (i) a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead, (ii) an integrated policy network that enables the transition and receives gradients of the expected free energy, (iii) an alternating optimization scheme that updates model and policy from a replay buffer, and (iv) a single gradient step that plans over long horizons, eliminating exhaustive planning from the control loop. We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings. The empirical results confirm the effectiveness of the proposed approach, demonstrating the coupled world-model with the AIF formalism yields an end-to-end probabilistic controller capable of effective decision making in delayed, long-horizon settings without handcrafted rewards or expensive planning.

Deep Active Inference Agents for Delayed and Long-Horizon Environments

TL;DR

This paper addresses the challenge of delayed, long-horizon control in active inference by introducing a Deep Active Inference (DAIF) agent that embeds a differentiable policy within a generative model. By overshooting latent dynamics to a long horizon and backpropagating the expected free energy (EFE) gradient through this horizon, the agent achieves planning without exhaustive search and supports continuous actions. The approach is trained via alternating updates on the variational free energy (VFE) for the world model and the EFE for the policy, using an experience replay buffer to stabilize learning. Empirically, DAIF demonstrates improved energy efficiency and robust long-horizon decision-making in a high-fidelity industrial control simulation, outperforming a model-free baseline and continuing to refine its predictive model even after the policy stabilizes. The work bridges neuroscience-inspired active inference with modern world-model reinforcement learning, offering a scalable framework for real-world, delayed-control tasks without hand-crafted rewards or complex planning heuristics.

Abstract

With the recent success of world-model agents, which extend the core idea of model-based reinforcement learning by learning a differentiable model for sample-efficient control across diverse tasks, active inference (AIF) offers a complementary, neuroscience-grounded paradigm that unifies perception, learning, and action within a single probabilistic framework powered by a generative model. Despite this promise, practical AIF agents still rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments requiring plans over long horizons, tens to hundreds of steps. Moreover, most existing agents are evaluated on robotic or vision benchmarks which, while natural for biological agents, fall short of real-world industrial complexity. We address these limitations with a generative-policy architecture featuring (i) a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead, (ii) an integrated policy network that enables the transition and receives gradients of the expected free energy, (iii) an alternating optimization scheme that updates model and policy from a replay buffer, and (iv) a single gradient step that plans over long horizons, eliminating exhaustive planning from the control loop. We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings. The empirical results confirm the effectiveness of the proposed approach, demonstrating the coupled world-model with the AIF formalism yields an end-to-end probabilistic controller capable of effective decision making in delayed, long-horizon settings without handcrafted rewards or expensive planning.

Paper Structure

This paper contains 25 sections, 15 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Two perspectives of the AIF framework: general steps (left) and core elements (right).
  • Figure 2: The Deep AIF agent architecture illustrates its interaction with the environment. The actor independently selects actions, while the generative model is used to optimize the policy.
  • Figure 3: The performance of the agent with $H=300$ on the real industrial system.
  • Figure 4: Performance of the agents versus overshooting horizon $H$.
  • Figure 5: Layout of parallel, identical machines in the workstation LOFFREDO202391.