Meta-Learning Strategies through Value Maximization in Neural Networks

Rodrigo Carrasco-Davis; Javier Masís; Andrew M. Saxe

Meta-Learning Strategies through Value Maximization in Neural Networks

Rodrigo Carrasco-Davis, Javier Masís, Andrew M. Saxe

TL;DR

The paper presents Learning Effort, a normative framework for meta-learning that maximizes discounted cumulative learning performance using a time-varying control signal g(t). By applying tractable deep linear network models and gradient-flow dynamics, it derives optimal intervention strategies across curricula, engagement, and neuromodulation-like gain signals, linking them to cognitive-control theories such as the Expected Value of Control. Key findings reveal that allocating more learning effort to easier aspects early, then sustaining focus on harder aspects, improves long-term performance; the framework also recasts MAML and bilevel optimization within a value-maximization perspective and demonstrates benefits for continual learning and task switching. Across single neurons, two-layer linear nets, and multiple meta-learning tasks, the approach provides a principled, testable account of normative learning interventions with potential neural implementations. The work offers a tractable foundation for studying how interventions can enhance learning trajectories in both artificial and biological agents, grounded in a formal connection to cognitive control and neuromodulatory mechanisms.

Abstract

Biological and artificial learning agents face numerous choices about how to learn, ranging from hyperparameter selection to aspects of task distributions like curricula. Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems. Yet optimal strategies remain challenging to compute in modern deep networks due to the complexity of optimizing through the entire learning process. Here we theoretically investigate optimal strategies in a tractable setting. We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective: discounted cumulative performance throughout learning. We obtain computational tractability by using average dynamical equations for gradient descent, available for simple neural network architectures. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting. We apply this framework to investigate the effect of approximations in common meta-learning algorithms; infer aspects of optimal curricula; and compute optimal neuronal resource allocation in a continual learning setting. Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning; followed by sustained effort on harder aspects. Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.

Meta-Learning Strategies through Value Maximization in Neural Networks

TL;DR

Abstract

Paper Structure (39 sections, 68 equations, 19 figures, 5 tables, 1 algorithm)

This paper contains 39 sections, 68 equations, 19 figures, 5 tables, 1 algorithm.

Introduction
Learning Effort Framework
Single Neuron Example
Baseline Deep Linear Networks and Datasets
Relation to Meta-Learning Algorithms in Machine Learning
Engagement modulation
Results
Gain Modulation
Results
Discussion
Acknowledgments
Further related work
Extended Discussion
Purpose of the control cost
Control gradient
...and 24 more sections

Figures (19)

Figure 1: Learning effort framework. A neural network is under the influence of a control signal $g(t)$. This control signal is optimized iteratively by initializing $g(t)$, then: (1) Solving learning dynamics in Eq. equation \ref{['eq:input_output_learning']}; (2) Computing the performance $\mathcal{P}(t)$; (3) Integrating performance and control cost to compute the exact cumulative return $V$ in Eq. \ref{['eq:value_integral']}; (4) Taking the gradient of $V$ with respect to the control signal $g(t)$ and update as in Eq. \ref{['eq:gradient_steps']}, then go back to (1). (b): Multi-step MAML. (c): Learning rate optimization as in Bilevel Programming. (d): Task engagement, where the control signal determines the optimal amount of engagement through time to multiple regression tasks. (f): Category assimilation, where a model is trained to learn a classification task and can control the engagement on each class $c$ throughout training. (e): Effort allocation, where the control signal (gain modulation of weights) is computed to maximize value throughout the learning of a single task. (g): Task switching, where the gain modulation model is trained to switch tasks repeatedly and the control signal is computed throughout the switches.
Figure 2: Results in single neuron model throughout the learning period $0 \geq t \geq T$. (a) Instant net reward $v(t)$. (b) Loss $\left<\mathcal{L}(t)\right>$ for theoretical predictions (solid) and simulations using SGD (shaded). (c) Optimal control signal decreases through learning (Baseline $g(t)=0$). (d) Weight $w(t)$ through learning for control and baseline case, $\tilde{w}(t)=w(t)\cdot(1+g(t))$. Dependence of optimal control signal on task parameters. (e) and (g): optimal $g(t)$ when varying discount factor $\gamma$ and noise level $\sigma_{x}$ respectively. (f) and (h): Difference between instant net rewards $v(t)$ between control and baseline when varying $\gamma$ and $\sigma_{x}$ respectively. Longer time horizons and less noisy tasks recruit more control.
Figure 3: (a): Single step MAML loss $V=\mathcal{P}(\delta t)$ when considering more steps in the learning dynamics. (b): Resulting learning dynamics from initial parameters found with Multi-Step MAML. (c) and (d): Optimal learning rate when varying discount factor $\gamma$ and cost coefficient $\beta$.
Figure 4: Results for task engagement experiment. (a), (c) and (e): $\mathcal{L}(t)$ for baseline and control case for Attentive, Active and Vector engagement. (b), (d) and (f): Engagement coefficients $\psi_{\tau}(t)$ for each of the binary classification tasks Attentive, Active and Vector engagement. Mean and standard deviations from 5 independent trainings. (h) and (j): Results for category engagement task, improvement in the loss function when using control for MNIST and Semantic dataset respectively. (i) and (k): Optimal category engagement coefficients for MNIST and Semantic datasets. (l): Class proportion experiment. Uniform: Loss when using uniform distribution for the abundance of classes in each batch. Balanced: Loss on a balanced batch, but using the inferred curriculum of classes in the batch to train. Curriculum: Loss on curriculum batch when using the curriculum. (m): Loss per class using control (solid lines) and baseline (dashed lines).
Figure 5: Results of the gain modulation model trained on an MNIST classification task. (a): Instant net reward $v(t)$, baseline vs controlled. (b): L1 and L2 norms of the weights. (c): Loss $\mathcal{L}(t)$ throughout learning. (d): normalized $d_{t}\mathcal{L}(t)$, and normalized L2 norm of the control signal $G_{1}(t)$ and $G_{2}(t)$. (e): Results on the task switch meta-task. Comparison of $\mathcal{L}(t)$ for the baseline and control case. (f): Values of $\mathcal{L}(t)$ at switch times, along with the normalized cost of control $C(t)$ at switch times (green line). (g): Zoom of $\mathcal{L}(t)$ in the top panel, along with the normalized cost of control.
...and 14 more figures

Meta-Learning Strategies through Value Maximization in Neural Networks

TL;DR

Abstract

Meta-Learning Strategies through Value Maximization in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (19)