Table of Contents
Fetching ...

Flexible inference for animal learning rules using neural networks

Yuhan Helena Liu, Victor Geadah, Jonathan Pillow

TL;DR

This work tackles how animals learn by inferring the learning rule from de novo behavioral data. It introduces a dynamic Bernoulli GLM decision policy whose trial-to-trial weight updates are modeled by a DNN (DNNGLM) or an RNN (RNNGLM). Simulations show ground-truth rule recovery for Markovian schemes and history-dependent updates for non-Markovian rules, while application to International Brain Laboratory mouse data demonstrates improved predictive accuracy over traditional RL. The work yields insights into reward-history dependent learning, supports more interpretable, data-driven behavioral models, and lays groundwork for behavioral digital twins and animal-aligned AI.

Abstract

Understanding how animals learn is a central challenge in neuroscience, with growing relevance to the development of animal- or human-aligned artificial intelligence. However, existing approaches tend to assume fixed parametric forms for the learning rule (e.g., Q-learning, policy gradient), which may not accurately describe the complex forms of learning employed by animals in realistic settings. Here we address this gap by developing a framework to infer learning rules directly from behavioral data collected during de novo task learning. We assume that animals follow a decision policy parameterized by a generalized linear model (GLM), and we model their learning rule -- the mapping from task covariates to per-trial weight updates -- using a deep neural network (DNN). This formulation allows flexible, data-driven inference of learning rules while maintaining an interpretable form of the decision policy itself. To capture more complex learning dynamics, we introduce a recurrent neural network (RNN) variant that relaxes the Markovian assumption that learning depends solely on covariates of the current trial, allowing for learning rules that integrate information over multiple trials. Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice. The inferred learning rules exhibited reward-history-dependent learning dynamics, with larger updates following sequences of rewarded trials. Overall, these methods provide a flexible framework for inferring learning rules from behavioral data in de novo learning tasks, setting the stage for improved animal training protocols and the development of behavioral digital twins.

Flexible inference for animal learning rules using neural networks

TL;DR

This work tackles how animals learn by inferring the learning rule from de novo behavioral data. It introduces a dynamic Bernoulli GLM decision policy whose trial-to-trial weight updates are modeled by a DNN (DNNGLM) or an RNN (RNNGLM). Simulations show ground-truth rule recovery for Markovian schemes and history-dependent updates for non-Markovian rules, while application to International Brain Laboratory mouse data demonstrates improved predictive accuracy over traditional RL. The work yields insights into reward-history dependent learning, supports more interpretable, data-driven behavioral models, and lays groundwork for behavioral digital twins and animal-aligned AI.

Abstract

Understanding how animals learn is a central challenge in neuroscience, with growing relevance to the development of animal- or human-aligned artificial intelligence. However, existing approaches tend to assume fixed parametric forms for the learning rule (e.g., Q-learning, policy gradient), which may not accurately describe the complex forms of learning employed by animals in realistic settings. Here we address this gap by developing a framework to infer learning rules directly from behavioral data collected during de novo task learning. We assume that animals follow a decision policy parameterized by a generalized linear model (GLM), and we model their learning rule -- the mapping from task covariates to per-trial weight updates -- using a deep neural network (DNN). This formulation allows flexible, data-driven inference of learning rules while maintaining an interpretable form of the decision policy itself. To capture more complex learning dynamics, we introduce a recurrent neural network (RNN) variant that relaxes the Markovian assumption that learning depends solely on covariates of the current trial, allowing for learning rules that integrate information over multiple trials. Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice. The inferred learning rules exhibited reward-history-dependent learning dynamics, with larger updates following sequences of rewarded trials. Overall, these methods provide a flexible framework for inferring learning rules from behavioral data in de novo learning tasks, setting the stage for improved animal training protocols and the development of behavioral digital twins.

Paper Structure

This paper contains 12 sections, 1 theorem, 29 equations, 18 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

For simplicity, let $\mathbf{w}_t \in \mathbb{R}$ for $t \in \{0, 1\}$, and assume $y_t \sim \mathrm{Bernoulli}(P_{y_t})$, where We further make the following simplifying assumptions: (1) $\Delta \mathbf{w}_t = \mathbf{w}_{t+1} - \mathbf{w}_t$ depends only on $(\mathbf{w}_t, x_t, y_t)$. (2) For each $t$, inputs $x_t^{(i)} \in \{-1, 1\}$ are drawn i.i.d. across samples from the uniform distributio

Figures (18)

  • Figure 1: Task schematic and learning rule inference methods. (A) We examine learning of a sensory decision-making task, in which mice must learn to report which side of the screen contains a visual stimulus by turning a wheel international2020standardized. (B) In our framework, decision-making is governed by the weights of a Bernoulli generalized linear model (GLM), which evolve across trials according to an unknown learning rule. (C-D) To infer the learning rule, we approximate the weight update function $\Delta \mathbf{w}_t$ using either (C) a deep neural network (DNN) that maps the current trial covariates to a weight change; or (D) a recurrent neural network (RNN) that integrates information from previous trials before feeding into the DNN. We optimized the neural network model parameters $\theta$ by maximizing the log-probability of the animal choice data under the dynamic Bernoulli GLM (see Methods and Algorithm \ref{['alg:learning-rule-inference']}). We refer to our approaches as $\textbf{DNNGLM}$ or $\textbf{RNNGLM}$ depending on the model used to parametrize the learning rule.
  • Figure 2: Recovering ground-truth learning rules in simulated data. We simulated animal learning using a policy gradient method known as "REINFORCE" williams1992simple. (See Supp. Fig. \ref{['fig:supp_predMax']} for an alternative ground-truth learning rule). This rule is Markovian in that the weight update $\Delta \mathbf{w}_t$ depends only on current-trial variables. (A) Simulated trajectory for stimulus weight $w_{stim}$ for an example simulated mouse (green) and the inferred weights using the DNNGLM (blue). (B) Left: Heatmap showing the ground-truth stimulus weight change $\Delta w_{stim}$ following a correct choice, as a function of $w_{stim}$ and stimulus $s$. Horizontal dashed lines indicate $w_{stim}$ or $w$ slices shown in middle and right panels. Middle: slices showing stimulus weight change $\Delta w_{stim}$ as a function of the stimulus $s$, for different values of the true weight $w_{stim}$, after both correct (solid) and incorrect (dashed) decisions. (The REINFORCE algorithm exhibits no learning after incorrect trials for this setting of rule parameters). Right: corresponding slices through the weight update function inferred using the DNNGLM. Note that the model successfully captures key characteristics of the learning rule, such as the slowing of learning at higher weights, increased learning with stimulus amplitude, and the asymmetry in learning after correct and incorrect choices. (See Supp. Fig. \ref{['fig:supp_default']} for analogous plots for bias parameter updates and Supp. Fig. \ref{['fig:supp_predMax']} for a comparison to other learning rules.) (C) Error in recovered learning rule as a function of dataset size, indicating that the DNNGLM converges to the true REINFORCE learning rule with increasing amounts of data.
  • Figure 3: Simulated example with a non-Markovian learning rule. The RNNGLM recovered the reward history dependence present in the ground truth: larger weight updates occur when the past $3$ trials were rewarded (black) versus unrewarded (red). Such differences were absent in DNNGLM, as it can only account for Markovian learning rules, where weight updates depend on current trial information. The ability of RNNGLM to capture these effects reflects its greater flexibility. Minor deviations from the ground truth are likely due to finite data effects (Fig. \ref{['fig:main_default']}C). To keep the figure simple, we only plotted with the learning rule given stimulus weight $w_{stim}=0$ and after correct choices, but we observed similar phenomena for other parameter settings.
  • Figure 4: Application to IBL mouse learning data.(A) Predicted stimulus (top) and bias (bottom) weight trajectories for a held-out animal --- using DNNGLM trained on other animals and applied to this animal’s stimulus and choice sequence --- plotted along the weight trajectories inferred by PsyTrack roy2018efficient, which is a learning-agnostic and reliable method for tracking psychometric weights from behavioral time series. (Supp. Fig. \ref{['fig:supp_psyDNN']} shows additional weight trajectories.) Similar traces were observed for RNNGLM as well. (B) Our methods (DNNGLM and RNNGLM) achieved significantly higher test log-likelihood (LL) on held-out data. Here we plot LL relative to the REINFORCE model. Notably, the RNNGLM model also out-performed the DNNGLM extended to include the previous trial input ("DNNGLM- history"). Error bars reflect standard deviation across cross-validation seeds.
  • Figure 5: Properties of inferred learning rules.(A) For the dataset in Fig. \ref{['fig:main_ibl']}, DNNGLM reveals negative weight updates following errors (coined as ‘negative baseline’ in geadah2025inferring, which was also observed in our flexible framework). Bias weight plot is provided in Supp. Fig. \ref{['fig:supp_IBL_db']}; note $\mathbf{w}$ is fixed at positive weights because the training data mainly involved positive $\mathbf{w}$. We follow the same plotting convention in Fig. \ref{['fig:main_default']}; since RNNGLM depends on additional historical variables, we plot the mean $\Delta w_{stim}$ averaged across these historical dimensions. (B) RNNGLM suggests non-Markovianity (history dependence) in learning: larger weight updates are observed when all past trials were rewarded (black) versus unrewarded (red). This gap widens as we condition on history beyond just the previous trial, suggesting that the history dependency extends beyond the most recent past trial. We note that plotting up to four past trials was intended only as an illustrative example, not a fundamental limit of RNNGLM; as shown in Supp. Fig. \ref{['fig:supp_toffset']}, the model suggests longer history dependencies. To keep the figure simple, we only plotted with stimulus $w$ fixed at 0, but similar trends are observed elsewhere.
  • ...and 13 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof