Table of Contents
Fetching ...

Intrinsic Goals for Autonomous Agents: Model-Based Exploration in Virtual Zebrafish Predicts Ethological Behavior and Whole-Brain Dynamics

Reece Keller, Alyn Kirsch, Felix Pei, Xaq Pitkow, Leo Kozachkov, Aran Nayebi

TL;DR

The paper introduces 3M-Progress, a model-based intrinsic drive that uses a fixed ethological memory prior and a learnable online memory to drive autonomous exploration in a virtual zebrafish setup. By partitioning behavior into niche-seeking and niche-avoidance through model-memory-mismatch, the approach yields stable, animal-like state transitions and tightly predicts whole-brain neural-glial activity, including astrocyte-mediated dynamics. The authors demonstrate that 3M-Progress replicates observed zebrafish behaviors and explains most of the explainable variance in neural-glial recordings, presenting the first goal-driven, self-supervised embodied agent that forecasts brain data. This work provides a computational framework linking intrinsic motivation to neural-glial computation and offers a foundation for designing autonomous artificial agents with animal-like autonomy. It also highlights two core principles for autonomy—avoiding uncontrollable stimuli and converging to robust policies—along with avenues for extending the framework to richer ecological scenarios and more detailed neurobiological mechanisms.

Abstract

Autonomy is a hallmark of animal intelligence, enabling adaptive and intelligent behavior in complex environments without relying on external reward or task structure. Existing reinforcement learning approaches to exploration in reward-free environments, including a class of methods known as model-based intrinsic motivation, exhibit inconsistent exploration patterns and do not converge to an exploratory policy, thus failing to capture robust autonomous behaviors observed in animals. Moreover, systems neuroscience has largely overlooked the neural basis of autonomy, focusing instead on experimental paradigms where animals are motivated by external reward rather than engaging in ethological, naturalistic and task-independent behavior. To bridge these gaps, we introduce a novel model-based intrinsic drive explicitly designed after the principles of autonomous exploration in animals. Our method (3M-Progress) achieves animal-like exploration by tracking divergence between an online world model and a fixed prior learned from an ecological niche. To the best of our knowledge, we introduce the first autonomous embodied agent that predicts brain data entirely from self-supervised optimization of an intrinsic goal -- without any behavioral or neural training data -- demonstrating that 3M-Progress agents capture the explainable variance in behavioral patterns and whole-brain neural-glial dynamics recorded from autonomously behaving larval zebrafish, thereby providing the first goal-driven, population-level model of neural-glial computation. Our findings establish a computational framework connecting model-based intrinsic motivation to naturalistic behavior, providing a foundation for building artificial agents with animal-like autonomy.

Intrinsic Goals for Autonomous Agents: Model-Based Exploration in Virtual Zebrafish Predicts Ethological Behavior and Whole-Brain Dynamics

TL;DR

The paper introduces 3M-Progress, a model-based intrinsic drive that uses a fixed ethological memory prior and a learnable online memory to drive autonomous exploration in a virtual zebrafish setup. By partitioning behavior into niche-seeking and niche-avoidance through model-memory-mismatch, the approach yields stable, animal-like state transitions and tightly predicts whole-brain neural-glial activity, including astrocyte-mediated dynamics. The authors demonstrate that 3M-Progress replicates observed zebrafish behaviors and explains most of the explainable variance in neural-glial recordings, presenting the first goal-driven, self-supervised embodied agent that forecasts brain data. This work provides a computational framework linking intrinsic motivation to neural-glial computation and offers a foundation for designing autonomous artificial agents with animal-like autonomy. It also highlights two core principles for autonomy—avoiding uncontrollable stimuli and converging to robust policies—along with avenues for extending the framework to richer ecological scenarios and more detailed neurobiological mechanisms.

Abstract

Autonomy is a hallmark of animal intelligence, enabling adaptive and intelligent behavior in complex environments without relying on external reward or task structure. Existing reinforcement learning approaches to exploration in reward-free environments, including a class of methods known as model-based intrinsic motivation, exhibit inconsistent exploration patterns and do not converge to an exploratory policy, thus failing to capture robust autonomous behaviors observed in animals. Moreover, systems neuroscience has largely overlooked the neural basis of autonomy, focusing instead on experimental paradigms where animals are motivated by external reward rather than engaging in ethological, naturalistic and task-independent behavior. To bridge these gaps, we introduce a novel model-based intrinsic drive explicitly designed after the principles of autonomous exploration in animals. Our method (3M-Progress) achieves animal-like exploration by tracking divergence between an online world model and a fixed prior learned from an ecological niche. To the best of our knowledge, we introduce the first autonomous embodied agent that predicts brain data entirely from self-supervised optimization of an intrinsic goal -- without any behavioral or neural training data -- demonstrating that 3M-Progress agents capture the explainable variance in behavioral patterns and whole-brain neural-glial dynamics recorded from autonomously behaving larval zebrafish, thereby providing the first goal-driven, population-level model of neural-glial computation. Our findings establish a computational framework connecting model-based intrinsic motivation to naturalistic behavior, providing a foundation for building artificial agents with animal-like autonomy.

Paper Structure

This paper contains 56 sections, 25 equations, 7 figures.

Figures (7)

  • Figure 1: Simulation of the zebrafish agent in a physics-based virtual environment. A) The 6-link embodiment geometry tassa2018deepmind in a environment with dynamic fluid forces. B) The agent controls the torque exerted by motors at each joint (5 DoF) to swim and navigate its environment. C) A custom cosmetic skin to mimic the appearance of larval zebrafish. D) A virtual environment matching the experimental parameters of the open loop protocol mu2019glia. The root joint located at the head is fixed during training.
  • Figure 2: Agent architecture and 3M-Progress. A) Egocentric visual input ($I_t)$ is encoded via a small residual network $\phi_I$. Proprioceptive state observations ($J_t$) are encoded via a small multi-layer perceptron $\phi_J$ with shortcut paths to both the core and policy module. Sensory features are passed into recurrent LSTM core ($h_t^c$) and policy ($h_t^\pi$) modules that learn a state value function and stochastic policy, respectively. The intrinsic drive module consists of a small multi-layer perceptron that parameterizes a forward dynamics model on sensory features observed from an environment with dynamics $T_{\mathrm{world}}$. B) 3M-Progress uses two memories created from environments with differing transition dynamics. Divergence between the ethological prior $\omega_\theta$ and the current world-model $\omega_{\theta'}$ defines 3M, which is then is used as input to leaky integrator $\hat{\epsilon}$ to generate intrinsic reward $r_t^i$.
  • Figure 3: Model-behavioral alignment. A) Swim power traces of artificial agents with different intrinsic drives throughout training. B) Pearson's $r$ correlation between agent swim power (joint torques) and zebrafish swim power (motor neuron activity) for active and passive behavioral transitions. C) (Top) Timecourse of active-passive transitions in zebrafish compared against stationary behavior from progress-driven agents for a single rollout. (Bottom) Average number of behavioral transitions per rollout across training for different intrinsic drives.
  • Figure 4: Model-brain alignment averaged across active and passive transitions. A) Noise-corrected Pearson's $r$ correlation between whole-brain neural and glial units and artificial units from trained agents. B) Model scores on behavioral and whole-brain alignment.
  • Figure 5: Latent dynamics of 3M-Progress agent's internal activations compared with normalized whole-brain neural-glial response in zebrafish. A) Principal components during passive and active transitions in the agent. B) Normalized average whole-brain neural-glial response during passive and active transitions in a zebrafish subject.
  • ...and 2 more figures