Table of Contents
Fetching ...

Prior preferences in active inference agents: soft, hard, and goal shaping

Filippo Torresan, Ryota Kanai, Manuel Baltieri

TL;DR

This work investigates how the specification of prior preferences—specifically hard vs soft goals and the presence or absence of goal shaping—affects perception, planning, and learning in active inference agents operating in a grid-world. By formalizing the agent as a POMDP and using variational inference with expected free energy, the study compares four preference regimes and demonstrates that goal shaping accelerates exploitation and task success, albeit at the cost of slower learning about environmental dynamics. The results reveal how policy and action probabilities are steered by the interplay of policy-conditioned and expected free energies, and how risk-driven preferences can both guide and limit exploration. The findings offer insights into designing priors for active inference systems and highlight trade-offs between rapid goal attainment and environment-model learning in low-dimensional settings with implications for more complex domains.

Abstract

Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative drive, or what an agent wants to achieve, is formalised as the Kullback-Leibler divergence between a variational probability distribution, updated at each inference step, and a preference probability distribution that indicates what states or observations are more likely for the agent, hence determining the agent's goal in a certain environment. In the literature, the questions of how the preference distribution should be specified and of how a certain specification impacts inference and learning in an active inference agent have been given hardly any attention. In this work, we consider four possible ways of defining the preference distribution, either providing the agents with hard or soft goals and either involving or not goal shaping (i.e., intermediate goals). We compare the performances of four agents, each given one of the possible preference distributions, in a grid world navigation task. Our results show that goal shaping enables the best performance overall (i.e., it promotes exploitation) while sacrificing learning about the environment's transition dynamics (i.e., it hampers exploration).

Prior preferences in active inference agents: soft, hard, and goal shaping

TL;DR

This work investigates how the specification of prior preferences—specifically hard vs soft goals and the presence or absence of goal shaping—affects perception, planning, and learning in active inference agents operating in a grid-world. By formalizing the agent as a POMDP and using variational inference with expected free energy, the study compares four preference regimes and demonstrates that goal shaping accelerates exploitation and task success, albeit at the cost of slower learning about environmental dynamics. The results reveal how policy and action probabilities are steered by the interplay of policy-conditioned and expected free energies, and how risk-driven preferences can both guide and limit exploration. The findings offer insights into designing priors for active inference systems and highlight trade-offs between rapid goal attainment and environment-model learning in low-dimensional settings with implications for more complex domains.

Abstract

Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative drive, or what an agent wants to achieve, is formalised as the Kullback-Leibler divergence between a variational probability distribution, updated at each inference step, and a preference probability distribution that indicates what states or observations are more likely for the agent, hence determining the agent's goal in a certain environment. In the literature, the questions of how the preference distribution should be specified and of how a certain specification impacts inference and learning in an active inference agent have been given hardly any attention. In this work, we consider four possible ways of defining the preference distribution, either providing the agents with hard or soft goals and either involving or not goal shaping (i.e., intermediate goals). We compare the performances of four agents, each given one of the possible preference distributions, in a grid world navigation task. Our results show that goal shaping enables the best performance overall (i.e., it promotes exploitation) while sacrificing learning about the environment's transition dynamics (i.e., it hampers exploration).

Paper Structure

This paper contains 28 sections, 11 equations, 23 figures, 2 tables.

Figures (23)

  • Figure 1: Percentage of agents reaching the goal state in each episode in the 5-step grid world (10 agents for each subplot).
  • Figure 2: Policy-conditioned free energies at step 5 across episodes (showing average of 10 agents).
  • Figure 3: Policy-conditioned free energies at step 1 across episodes (showing average of 10 agents).
  • Figure 4: Expected free energy for each policy across episodes (showing average of 10 agents). Notice that we only draw 16 expected free energies, representative of the possible 256.
  • Figure 5: Policies probabilities at the first step of each episode (showing average of 10 agents). Notice we only draw 16 representative policies out of the possible 256.
  • ...and 18 more figures

Theorems & Definitions (2)

  • definition 1: POMDP in active inference, the generative process
  • definition 2: Generative model in active inference