Flexible inference for animal learning rules using neural networks
Yuhan Helena Liu, Victor Geadah, Jonathan Pillow
TL;DR
This work tackles how animals learn by inferring the learning rule from de novo behavioral data. It introduces a dynamic Bernoulli GLM decision policy whose trial-to-trial weight updates are modeled by a DNN (DNNGLM) or an RNN (RNNGLM). Simulations show ground-truth rule recovery for Markovian schemes and history-dependent updates for non-Markovian rules, while application to International Brain Laboratory mouse data demonstrates improved predictive accuracy over traditional RL. The work yields insights into reward-history dependent learning, supports more interpretable, data-driven behavioral models, and lays groundwork for behavioral digital twins and animal-aligned AI.
Abstract
Understanding how animals learn is a central challenge in neuroscience, with growing relevance to the development of animal- or human-aligned artificial intelligence. However, existing approaches tend to assume fixed parametric forms for the learning rule (e.g., Q-learning, policy gradient), which may not accurately describe the complex forms of learning employed by animals in realistic settings. Here we address this gap by developing a framework to infer learning rules directly from behavioral data collected during de novo task learning. We assume that animals follow a decision policy parameterized by a generalized linear model (GLM), and we model their learning rule -- the mapping from task covariates to per-trial weight updates -- using a deep neural network (DNN). This formulation allows flexible, data-driven inference of learning rules while maintaining an interpretable form of the decision policy itself. To capture more complex learning dynamics, we introduce a recurrent neural network (RNN) variant that relaxes the Markovian assumption that learning depends solely on covariates of the current trial, allowing for learning rules that integrate information over multiple trials. Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice. The inferred learning rules exhibited reward-history-dependent learning dynamics, with larger updates following sequences of rewarded trials. Overall, these methods provide a flexible framework for inferring learning rules from behavioral data in de novo learning tasks, setting the stage for improved animal training protocols and the development of behavioral digital twins.
