Synaptic Weight Distributions Depend on the Geometry of Plasticity

Roman Pogodin; Jonathan Cornford; Arna Ghosh; Gauthier Gidel; Guillaume Lajoie; Blake Richards

Synaptic Weight Distributions Depend on the Geometry of Plasticity

Roman Pogodin, Jonathan Cornford, Arna Ghosh, Gauthier Gidel, Guillaume Lajoie, Blake Richards

TL;DR

The paper investigates how the geometry of synaptic plasticity affects learning by framing synaptic updates through mirror descent, where the distance in weight space is defined by a potential $\phi$ rather than the Euclidean $L^2$ norm. It proves that, under small updates and mild assumptions, final weight distributions are shaped by the geometry via a Gaussian dual-space term, enabling geometry identification if a dual representation yields Gaussian changes. The authors connect this theory to empirical observations of log-normal synaptic weights and show that correct non-Euclidean geometries produce Gaussian dual-space updates, while incorrect geometries yield non-Gaussian changes, providing a practical test using pre/post-learning weight distributions. Overall, the work offers a principled method to infer the brain’s synaptic geometry from data and suggests non-Euclidean geometries may better explain synaptic statistics across cortical areas, with implications for modeling learning in neuroscience.

Abstract

A growing literature in computational neuroscience leverages gradient descent and learning algorithms that approximate it to study synaptic plasticity in the brain. However, the vast majority of this work ignores a critical underlying assumption: the choice of distance for synaptic changes - i.e. the geometry of synaptic plasticity. Gradient descent assumes that the distance is Euclidean, but many other distances are possible, and there is no reason that biology necessarily uses Euclidean geometry. Here, using the theoretical tools provided by mirror descent, we show that the distribution of synaptic weights will depend on the geometry of synaptic plasticity. We use these results to show that experimentally-observed log-normal weight distributions found in several brain areas are not consistent with standard gradient descent (i.e. a Euclidean geometry), but rather with non-Euclidean distances. Finally, we show that it should be possible to experimentally test for different synaptic geometries by comparing synaptic weight distributions before and after learning. Overall, our work shows that the current paradigm in theoretical work on synaptic plasticity that assumes Euclidean synaptic geometry may be misguided and that it should be possible to experimentally determine the true geometry of synaptic plasticity in the brain.

Synaptic Weight Distributions Depend on the Geometry of Plasticity

TL;DR

The paper investigates how the geometry of synaptic plasticity affects learning by framing synaptic updates through mirror descent, where the distance in weight space is defined by a potential

rather than the Euclidean

norm. It proves that, under small updates and mild assumptions, final weight distributions are shaped by the geometry via a Gaussian dual-space term, enabling geometry identification if a dual representation yields Gaussian changes. The authors connect this theory to empirical observations of log-normal synaptic weights and show that correct non-Euclidean geometries produce Gaussian dual-space updates, while incorrect geometries yield non-Gaussian changes, providing a practical test using pre/post-learning weight distributions. Overall, the work offers a principled method to infer the brain’s synaptic geometry from data and suggests non-Euclidean geometries may better explain synaptic statistics across cortical areas, with implications for modeling learning in neuroscience.

Abstract

Paper Structure (20 sections, 6 theorems, 57 equations, 7 figures)

This paper contains 20 sections, 6 theorems, 57 equations, 7 figures.

Introduction
Related work
Mirror descent framework
Implicit bias in mirror descent
Weight distributions in mirror descent
Experiments
Linear regression
Robustness to potential change
Finetuning of deep networks
Estimating synaptic geometry experimentally
Discussion
Weight distribution convergence
Setup
Rich and lazy regimes
Experimental details
...and 5 more sections

Key Result

Theorem 1

Consider $N$ i.i.d. samples $y^n,\mathbf{x}^n$, such that: $\mathbf{x}^n\!\in\!\mathbb{R}^D$ are zero-mean and bounded; pairwise correlations $c_{ij}\!=\!\mathbb{E}\,x^n_ix^n_j$ and $c_{ij}'\!=\!\mathrm{Cov}((x^n_i)^2, (x^n_j)^2)$ between entries of a single $\mathbf{x}^n$ decay quickly enough so $\ where $h=\mathbb{E}\left[\nabla^2\phi^{-1}(w^0_i)\right]$ (a scaling factor that is identical for a

Figures (7)

Figure 1: Mirror descent dynamics.
Figure 2: \ref{['eq:md_argmin']} dynamics for $l(w_1, w_2)\!=\!(\frac{1}{2}w_1\!+\!w_2\!-\!1)e^{\frac{1}{2}w_1 + w_2}$. Blue: negative entropy (NE, \ref{['eq:eg_explicit_update']}); green: gradient descent/2-norm (\ref{['eq:gd']}); orange: 3-norm. Left: loss surface (and level sets shown below it) and dynamics in the regular $(w_1, w_2)$ coordinates. Right: same, but in the $(\nabla\phi_{\mathrm{NE}}(w_1), \nabla\phi_{\mathrm{NE}}(w_2))$ coordinates.
Figure 3: A. Linear regression solutions for negative entropy (NE) and fast correlation decay ($c\!=\!2$). Left: integral of absolute CDF difference ($\Delta\mathrm{CDF}$) between normalized uncentered weights and $\mathcal{N}(0, 1)$. Right: magnitude of weight changes relative to the initial weights in the dual space ($\Delta\phi$). Solid line: median over 30 seeds; shaded area: 5/95% percentiles; pink: $N\!=\!D^{0.5}$; blue: $N\!=\!D^{0.75}$. B.$\Delta\mathrm{CDF}$ for Gaussian (left) and log-normal (right) weight initializations and a Gaussian addition w.r.t. $\phi$, evaluated on another potential $\phi'$ (e.g. blue lines are sampled for NE but evaluated on every potential). Solid line: mean over 30 seeds; shaded areas: mean $\pm$ standard deviation.
Figure 4: Finetuning on 10 randomly sampled ImageNet validation subsets ($N=D^{0.5}$ data points). A. Integral of absolute CDF difference ($\Delta\mathrm{CDF}$) between normalized uncentered weights and $\mathcal{N}(0, 1)$ for networks trained with different potentials. Circle: individual value; bar: mean over seeds. Pink box (bottom right): examples of weight change histograms (pink) plotted against $\mathcal{N}(0, 1)$ (black). B. Same as A., but $\Delta\mathrm{CDF}$ is calculated w.r.t. other potentials.
Figure 5: Observed vs. modeled synaptic distributions. Pink: spine volume ($\mu m^3$) from dorkenwald2022binary; vertical bars: point mass initializations. A. Left: initial weights for negative entropy (NE); right: final weight for a Gaussian change in the dual space. B. Same as A, but for 3-norm.
...and 2 more figures

Theorems & Definitions (12)

Theorem 1: Informal
proof : Proof sketch.
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Theorem 1
proof
...and 2 more

Synaptic Weight Distributions Depend on the Geometry of Plasticity

TL;DR

Abstract

Synaptic Weight Distributions Depend on the Geometry of Plasticity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)