Table of Contents
Fetching ...

Homeostatic Ubiquity of Hebbian Dynamics in Regularized Learning Rules

David Koplow, Tomaso Poggio, Liu Ziyin

TL;DR

The paper reveals that regularized learning dynamics, particularly SGD with weight decay, can produce Hebbian-like learning signals near stationarity, and that increasing noise can induce anti-Hebbian behavior; these effects generalize across a wide range of optimizers and architectures. By formulating a formal alignment measure between the learning signal and Hebbian updates and proving positive alignment at stationarity, the work unifies two seemingly distinct learning paradigms as emergent regimes of optimization. It further predicts a phase boundary where noise overrides regularization to yield anti-Hebbian dynamics and documents transient Hebbian/anti-Hebbian phases during training. The results offer a framework for interpreting neurophysiological plasticity data as potential epiphenomena of optimization, while outlining experimental tests to distinguish these mechanisms in biological circuits.

Abstract

Hebbian and anti-Hebbian plasticity are widely observed in the biological brain, yet their theoretical understanding remains limited. In this work, we find that when a learning method is regularized with L2 weight decay, its learning signal will gradually align with the direction of the Hebbian learning signal as it approaches stationarity. This Hebbian-like behavior is not unique to SGD: almost any learning rule, including random ones, can exhibit the same signature long before learning has ceased. We also provide a theoretical explanation for anti-Hebbian plasticity in regression tasks, demonstrating how it can arise naturally from gradient or input noise, and offering a potential reason for the observed anti-Hebbian effects in the brain. Certainly, our proposed mechanisms do not rule out any conventionally established forms of Hebbian plasticity and could coexist with them extensively in the brain. A key insight for neurophysiology is the need to develop ways to experimentally distinguish these two types of Hebbian observations.

Homeostatic Ubiquity of Hebbian Dynamics in Regularized Learning Rules

TL;DR

The paper reveals that regularized learning dynamics, particularly SGD with weight decay, can produce Hebbian-like learning signals near stationarity, and that increasing noise can induce anti-Hebbian behavior; these effects generalize across a wide range of optimizers and architectures. By formulating a formal alignment measure between the learning signal and Hebbian updates and proving positive alignment at stationarity, the work unifies two seemingly distinct learning paradigms as emergent regimes of optimization. It further predicts a phase boundary where noise overrides regularization to yield anti-Hebbian dynamics and documents transient Hebbian/anti-Hebbian phases during training. The results offer a framework for interpreting neurophysiological plasticity data as potential epiphenomena of optimization, while outlining experimental tests to distinguish these mechanisms in biological circuits.

Abstract

Hebbian and anti-Hebbian plasticity are widely observed in the biological brain, yet their theoretical understanding remains limited. In this work, we find that when a learning method is regularized with L2 weight decay, its learning signal will gradually align with the direction of the Hebbian learning signal as it approaches stationarity. This Hebbian-like behavior is not unique to SGD: almost any learning rule, including random ones, can exhibit the same signature long before learning has ceased. We also provide a theoretical explanation for anti-Hebbian plasticity in regression tasks, demonstrating how it can arise naturally from gradient or input noise, and offering a potential reason for the observed anti-Hebbian effects in the brain. Certainly, our proposed mechanisms do not rule out any conventionally established forms of Hebbian plasticity and could coexist with them extensively in the brain. A key insight for neurophysiology is the need to develop ways to experimentally distinguish these two types of Hebbian observations.

Paper Structure

This paper contains 40 sections, 5 theorems, 76 equations, 20 figures, 1 table.

Key Result

Lemma 1

Let $W^\star$ be a stationary point of $\mathcal{L}$ with $W^\star \ne 0$. Under Assumptions ass:chain and ass:norm,

Figures (20)

  • Figure 1: Balance of contractive and expansive forces. For deep learning, the noise and weight decay are, respectively, expansion and contraction forces. When they do not balance, the gradient must fill in the gap -- if noise outweighs weight decay, the gradient must appear contractive; otherwise, it appears expansive. Similarly, for biology, the Hebbian dynamics is always expansive, and the anti-Hebbian dynamics is always contractive. Thus, to reach a balance, a learning signal will look like, and become aligned with, the Hebbian or anti-Hebbian rule depending on whether it is expansive or contractive.
  • Figure 2: The left shows example weight updates with a high alignment between the learning signal ($-\nabla_{W}\ell$) and the Hebbian update at the end of training with high weight decay, while the right image displays an example update at the end of training with no weight decay which has very low alignment. This figure shows a 20x20 subset of the direction of the Hebbian and learning signal updates for the second layer of an SCE after training with $\eta=0.1$, and $\gamma=0.05$, or $\gamma=0.0$. Dimension 1 can be viewed as the output (post-synaptic) neuron (in the case of SGD, whose incoming weights we are differentiating), and Dimension 2 are input (pre-synaptic) feature/neurons that the weight projects from. We are only visualizing a 20x20 subset of these updates for clarity. Examples of low cosine similarity of the learning signal for $\gamma=0.05$ at the start and end of training can be seen in Figure \ref{['fig:low-sim-update-end-of-training']}. In general, we find that stronger weight decay, larger learning rate, and larger batch size lead to better alignment (Figures \ref{['appendix:batch_size_alignment']} and \ref{['appendix:lr_alignment']}).
  • Figure 3: The diagram on the left shows that the trend of weight decay increasing Hebbian alignment of the learning signal is robust across different activation functions. The diagram on the right shows that the trend can generalize to deeper networks. The SCE MLPs were modified by varying the activation functions across Linear, ReLU, Sigmoid, and Tanh (left) and increasing the depth to 6 and layer width to dimension 512 (shown by the 6x512 tanh MLP plot on the right). Layer 1, Layer 2, ... Layer 6 in this diagram indicate the Hebbian alignment with the learning signal for the corresponding layer. For a small (or zero) weight decay, the learning process sometimes exhibits a weak anti-Hebbian alignment, indicated by a negative alignment with Hebbian learning. All markers represent the average alignment over the final 100 steps, averaged across 10 runs with different seeds. The error bars represent the std of the final average alignment across the seeds. The trend is very robust, and so many of the bars are obscured by the markers, particularly in the left diagram, for which the largest std was 0.012.
  • Figure 4: As the noise increases, the Hebbian alignment decreases and higher weight decays lead to higher Hebbian alignment (right). The figure on the left displays a heatmap of the Hebbian alignment of the learning signal at convergence for a number of different additive noises and weight decays; there is a clear quadratic curve at zero-alignment as predicted by the theory. The SRE was augmented by adding noise to each parameter at the start of each iteration with a mean of zero and the specified standard deviation on the diagram. The trend is even clearer when we follow the behavior of varying the noise of a specific weight decay (Varying Noise) or the weight decay of a specific noise standard deviation (Varying Weight Decay). Each cell on the left and marker on the right represents a single run.
  • Figure 5: Best performance of the model is achieved when it is not Hebbian or anti-Hebbian on average. The left image displays the student validation loss for the experiment in Figure \ref{['fig:wd-noise-heatmap']}, while the right image shows a scatter plot of the validation loss vs. Hebbian alignment of the gradient. There seems to be some weak saddle phenomena in loss that occur at the phase transition boundary of Hebbian alignment with respect to noise and scale. The validation loss reduces as both weight decay and noise get smaller. Each cell on the left, and circle on the right, represents a single seed.
  • ...and 15 more figures

Theorems & Definitions (10)

  • Lemma 1: Value of $C(W)$ at stationarity
  • proof
  • Lemma 2: Lipschitz continuity of $C(W)$ near $W^\star$
  • proof
  • Theorem 1: Hebbian alignment in a neighborhood of stationarity
  • proof
  • Lemma 3: Control of distance by vector field
  • proof
  • Theorem 2: Hebbian bound in terms of stationarity gap
  • proof