Prior preferences in active inference agents: soft, hard, and goal shaping
Filippo Torresan, Ryota Kanai, Manuel Baltieri
TL;DR
This work investigates how the specification of prior preferences—specifically hard vs soft goals and the presence or absence of goal shaping—affects perception, planning, and learning in active inference agents operating in a grid-world. By formalizing the agent as a POMDP and using variational inference with expected free energy, the study compares four preference regimes and demonstrates that goal shaping accelerates exploitation and task success, albeit at the cost of slower learning about environmental dynamics. The results reveal how policy and action probabilities are steered by the interplay of policy-conditioned and expected free energies, and how risk-driven preferences can both guide and limit exploration. The findings offer insights into designing priors for active inference systems and highlight trade-offs between rapid goal attainment and environment-model learning in low-dimensional settings with implications for more complex domains.
Abstract
Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative drive, or what an agent wants to achieve, is formalised as the Kullback-Leibler divergence between a variational probability distribution, updated at each inference step, and a preference probability distribution that indicates what states or observations are more likely for the agent, hence determining the agent's goal in a certain environment. In the literature, the questions of how the preference distribution should be specified and of how a certain specification impacts inference and learning in an active inference agent have been given hardly any attention. In this work, we consider four possible ways of defining the preference distribution, either providing the agents with hard or soft goals and either involving or not goal shaping (i.e., intermediate goals). We compare the performances of four agents, each given one of the possible preference distributions, in a grid world navigation task. Our results show that goal shaping enables the best performance overall (i.e., it promotes exploitation) while sacrificing learning about the environment's transition dynamics (i.e., it hampers exploration).
