Table of Contents
Fetching ...

Emergence of Goal-Directed Behaviors via Active Inference with Self-Prior

Dongmin Kim, Hoshinori Kanazawa, Naoto Yoshida, Yasuo Kuniyoshi

TL;DR

The paper addresses how intrinsic motivation can give rise to goal-directed behavior in infancy without external rewards. It introduces the self-prior, an internally learned density over multimodal sensory experiences, as a behavioral setpoint within the active inference/free energy framework. Through discrete and continuous simulations, the study shows spontaneous reaching and even removal of a sticker driven by minimizing mismatches between observations and the self-prior, illustrating a homeostatic intrinsic motivation that forms a body-schema-like representation. The findings offer a computational mechanism for the emergence of early intentional behavior and highlight pathways to integrate self-generated priors with extrinsic motivations in more complex environments.

Abstract

Infants often exhibit goal-directed behaviors, such as reaching for a sensory stimulus, even when no external reward criterion is provided. These intrinsically motivated behaviors facilitate spontaneous exploration and learning of the body and environment during early developmental stages. Although computational modeling can offer insight into the mechanisms underlying such behaviors, many existing studies on intrinsic motivation focus primarily on how exploration contributes to acquiring external rewards. In this paper, we propose a novel density model for an agent's own multimodal sensory experiences, called the "self-prior," and investigate whether it can autonomously induce goal-directed behavior. Integrated within an active inference framework based on the free energy principle, the self-prior generates behavioral references purely from an intrinsic process that minimizes mismatches between average past sensory experiences and current observations. This mechanism is also analogous to the acquisition and utilization of a body schema through continuous interaction with the environment. We examine this approach in a simulated environment and confirm that the agent spontaneously reaches toward a tactile stimulus. Our study implements intrinsically motivated behavior shaped by the agent's own sensory experiences, demonstrating the spontaneous emergence of intentional behavior during early development.

Emergence of Goal-Directed Behaviors via Active Inference with Self-Prior

TL;DR

The paper addresses how intrinsic motivation can give rise to goal-directed behavior in infancy without external rewards. It introduces the self-prior, an internally learned density over multimodal sensory experiences, as a behavioral setpoint within the active inference/free energy framework. Through discrete and continuous simulations, the study shows spontaneous reaching and even removal of a sticker driven by minimizing mismatches between observations and the self-prior, illustrating a homeostatic intrinsic motivation that forms a body-schema-like representation. The findings offer a computational mechanism for the emergence of early intentional behavior and highlight pathways to integrate self-generated priors with extrinsic motivations in more complex environments.

Abstract

Infants often exhibit goal-directed behaviors, such as reaching for a sensory stimulus, even when no external reward criterion is provided. These intrinsically motivated behaviors facilitate spontaneous exploration and learning of the body and environment during early developmental stages. Although computational modeling can offer insight into the mechanisms underlying such behaviors, many existing studies on intrinsic motivation focus primarily on how exploration contributes to acquiring external rewards. In this paper, we propose a novel density model for an agent's own multimodal sensory experiences, called the "self-prior," and investigate whether it can autonomously induce goal-directed behavior. Integrated within an active inference framework based on the free energy principle, the self-prior generates behavioral references purely from an intrinsic process that minimizes mismatches between average past sensory experiences and current observations. This mechanism is also analogous to the acquisition and utilization of a body schema through continuous interaction with the environment. We examine this approach in a simulated environment and confirm that the agent spontaneously reaches toward a tactile stimulus. Our study implements intrinsically motivated behavior shaped by the agent's own sensory experiences, demonstrating the spontaneous emergence of intentional behavior during early development.

Paper Structure

This paper contains 29 sections, 19 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Emergence of reaching behavior via the self-prior and active inference. (a) When a sticker is placed on the left arm of the simulated agent, it detects a mismatch with its prior experience of not having a sticker, and reaches toward the sticker with its right hand to minimize the discrepancy. (b) Development of the self-prior through experience: as sensory experiences are collected, the probability distribution over sensory patterns gradually develops. (c) The active inference process in which the agent plans future actions to minimize expected free energy by aligning sensory inputs with the learned self-prior. As a result, the agent performs a reaching action toward the sticker. A full-body infant illustration is used for clarity, the actual experiment was conducted in a pseudo-3D environment.
  • Figure 2: Graphical model of active inference using deep neural networks that minimize variational free energy $\mathcal{F}$ and expected free energy $\mathcal{G}$. The self-prior $\tilde{p}(o^I_t)$ is trained to maximize the log-likelihood of observations $o^I_t$. In the expected free energy calculation (highlighted in blue), the learned self-prior serves as the behavioral setpoint alongside the fixed preferred prior. Although the preferred prior and self-prior can theoretically be applied simultaneously, we use only the self-prior in this study for clarity of exposition; thus, the preferred prior is shown faded in the figure.
  • Figure 3: Overview of the discrete environment. The right hand can move left or right either above the left arm or outside of it, and tactile input occurs either where the right hand is located or where the sticker is attached.
  • Figure 4: Change in self-prior over time. Before the sticker is attached, the probability increases for situations where no sticker is present on the arm ($t<10,000$). After the sticker is attached, the agent gradually adapts to the new situation where the sticker is present ($t\ge10,000$).
  • Figure 5: Comparison of the agent's behavior before and after acquiring the self-prior. The top panel illustrates environmental changes over time: the red line indicates the hand position, the yellow dashed line indicates the sticker position. White areas denote where tactile feedback occurred, black areas indicate no tactile feedback, and gray areas represent regions outside the arm where tactile feedback never occurs. The bottom panel shows expected free energy over time. Each green dot represents the free energy of a candidate policy, with lower free energy policies being more likely to be selected. The red line connects the actually selected policies. (a) Before acquiring the self-prior, the agent does not respond even when a sticker is attached to the arm. (b) After acquiring the self-prior, goal-directed behavior emerges: the agent moves its hand to the sticker’s location when it appears on the arm.
  • ...and 3 more figures