How connectivity structure shapes rich and lazy learning in neural circuits

Yuhan Helena Liu; Aristide Baratin; Jonathan Cornford; Stefan Mihalas; Eric Shea-Brown; Guillaume Lajoie

How connectivity structure shapes rich and lazy learning in neural circuits

Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie

TL;DR

The paper investigates how the effective rank of initial neural connectivity influences learning regimes in neural networks, bridging theoretical insights with neuroscience-inspired connectivity. Through a two-layer linear analysis and RNN simulations, it demonstrates that higher-rank initializations tend to produce effectively lazier learning (smaller NTK changes) on average across tasks, while low-rank initializations promote richer learning unless aligned with task statistics. Empirical validation using neuroscience-style tasks and biologically motivated connectivity patterns confirms the central prediction and reveals aligned-low-rank cases where laziness can still emerge. The findings suggest that initial connectivity structure, shaped by development or evolution, can modulate plasticity costs and forgetting risks, with implications for brain-inspired AI and neurobiological interpretations of learning dynamics.

Abstract

In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity could exhibit a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.

How connectivity structure shapes rich and lazy learning in neural circuits

TL;DR

Abstract

Paper Structure (20 sections, 7 theorems, 29 equations, 19 figures)

This paper contains 20 sections, 7 theorems, 29 equations, 19 figures.

Introduction
Contributions
Related works
Setup and Theoretical Findings
RNN setup
Effective laziness measures
Theoretical findings
Simulation results
Discussion
Acknowledgement
Extended discussions on related works
Proofs
Proofs for main text Theorem and Proposition
Notation
Prior results
...and 5 more sections

Key Result

Theorem 1

(Informal) Consider the network above with its corresponding NTK in Eq. eqn:NTK, trained under MSE loss with small initialization and whitened data. The expected kernel alignment across tasks is maximized with high-rank initialization, i.e. the singular values of $W^{(0)}_1$ are distributed across a

Figures (19)

Figure 1: Low-rank initial recurrent weights, generated using SVD, lead to greater changes (or effectively richer learning) in the recurrent neural network. A) Schematic of RNN training setup. B) Measurements of effective richness vs laziness of learning (metrics as defined in Section \ref{['scn:laziness_measures']}), for RNN trained on several cognitive tasks in Neurogym molano2022neurogym as well as the sequential MNIST task (sMNIST). For details on SVD weight creation, see Appendix \ref{['scn:sim_details']}. Fewer rank points were used for sMNIST due to computational time. Each dot represents a single training run, with each run using a different random initialization (10 runs total for each setting).
Figure 2: Low-rank initial weight structures, inspired by biological examples, lead to effectively richer learning. We present the eigenspectrum and the relative effective rank of connectivity in A) structures with cell-type-specific statistics, B) structures derived from EM data, C) structures obeying Dale's law, and D) structures with an over-representation of chain motifs; we also present the effective learning laziness for networks initialized with these connectivity structures. These structures exhibit a lower effective rank compared to standard random Gaussian initialization (null). We plotted the magnitude of the eigenvalues (Eigval mag) --- scaled by the dominant eigenvalue's magnitude --- against their indices normalized by the network size $N$ (Eigval index). We apply the effective laziness measures described in Section \ref{['scn:laziness_measures']} to compare the effective laziness of experimentally-driven initial connectivity versus standard random Gaussian initialization (null). See Appendix \ref{['scn:sim_details']} for details on network initialization. The boxplots are generated from 10 independent runs with different initialization seeds. Due to space constraints, we include only the 2AF task here, but Appendix Figures \ref{['fig:bio_laziness_DMS']} and \ref{['fig:bio_laziness_CXT']} show that similar trends hold for the DMS and CXT tasks.
Figure 3: Low-rank initializations can still achieve high alignment for specific tasks (see Proposition \ref{['prop:aligned_ini']}). A) The student-teacher two-layer linear network setup as described in Section \ref{['scn:theory']}, but with feature anisotropy controlled by a feature modulation matrix $F$, i.e. $z=Fx$. The condition number of $F$ dictates the relative feature strength. We set the top half of the singular values of $F$ are set to $\kappa$, while the bottom half are set to $1$, where $\kappa$ represents the condition number of $F$. B) The aligned initialization (green) is achieved by setting $W_1$ as described in Proposition \ref{['prop:aligned_ini']} (with $\beta = w^T F$, $w$ is as illustrated), so that the initialization aligns with the task statistics. The partial alignment (blue) mirrors the aligned case, but $F$ is substituted with its rank-$(d/2)$ truncation, causing the network to align only with the dominant features. We observe that a considerably higher alignment can be achieved when the initialization aligns solely with the dominant features, especially when the relative strength of these dominant features is high. C) The analysis from B) is replicated for RNNs learning the sMNIST task. As the ground truth network function is elusive, we use a teacher network with pre-trained weights. Once again, we replace $F$ with its rank-$(d/2)$ truncation for partial alignment. Details on the input/output definitions and initializations, as well as other simulation specifics, are available in Appendix \ref{['scn:sim_details']}. We note that in all scenarios presented here, the initial errors are high since the readout weights are initialized randomly, rendering it a valid learning problem.
Figure 4: As predicted by the theoretical results, higher rank random initialization leads to effectively lazier learning in two-layer linear network. A) We use the student-teacher two-layer linear network setup described in Section \ref{['scn:theory']}. B) a non-idealized setting: two-layer feedforward network with ReLU activation and 300 hidden units trained on the MNIST dataset. Plotting convention follows that of Figure \ref{['fig:svd']}.
Figure 5: We repeated Figure \ref{['fig:bio_spectrum_laziness']} for the DMS task and observed similar trends: low-rank initialization, achieved by experimentally-driven initial connectivity in Figure \ref{['fig:bio_spectrum_laziness']}, leads to effectively richer learning. The plotting conventions used here follow those in Figure \ref{['fig:bio_spectrum_laziness']}, with panels A-D corresponding to the ones in that figure.
...and 14 more figures

Theorems & Definitions (12)

Theorem 1
Proposition 1
Theorem 1
proof
Lemma 1
proof
Lemma 2
proof
Proposition 1
proof
...and 2 more

How connectivity structure shapes rich and lazy learning in neural circuits

TL;DR

Abstract

How connectivity structure shapes rich and lazy learning in neural circuits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (12)