Table of Contents
Fetching ...

Two Tales of Single-Phase Contrastive Hebbian Learning

Rasmus Kjær Høier, Christopher Zach

TL;DR

The paper tackles the challenge of biologically plausible, fully local gradient learning without phase-based updates, addressing the limitations of infinitesimal nudging and strict symmetry. It introduces Dual Propagation (DP) as a single-inference, local-learning framework that infers two oppositely nudged states per neuron, and strengthens its practicality by deriving a robust DP$^{\top}$ variant that handles asymmetric nudging. Two complementary theoretical perspectives are developed: a Relaxation Perspective that derives DP from repeated optimal-value reformulations (ROVR, SPROVR, AROVR) and a Lagrangian Perspective that yields DP$^{\top}$ and connects to LeCun-like formulations. Numerical experiments demonstrate the impact of the nudging parameter $\alpha$ on stability and Lipschitz properties, and show scalable performance on MNIST, CIFAR, and VGG16-scale models, highlighting potential for on-device neuromorphic training. The work thus advances both the theoretical understanding and practical viability of fully local, robust gradient learning for energy-efficient hardware.

Abstract

The search for ``biologically plausible'' learning algorithms has converged on the idea of representing gradients as activity differences. However, most approaches require a high degree of synchronization (distinct phases during learning) and introduce substantial computational overhead, which raises doubts regarding their biological plausibility as well as their potential utility for neuromorphic computing. Furthermore, they commonly rely on applying infinitesimal perturbations (nudges) to output units, which is impractical in noisy environments. Recently it has been shown that by modelling artificial neurons as dyads with two oppositely nudged compartments, it is possible for a fully local learning algorithm named ``dual propagation'' to bridge the performance gap to backpropagation, without requiring separate learning phases or infinitesimal nudging. However, the algorithm has the drawback that its numerical stability relies on symmetric nudging, which may be restrictive in biological and analog implementations. In this work we first provide a solid foundation for the objective underlying the dual propagation method, which also reveals a surprising connection with adversarial robustness. Second, we demonstrate how dual propagation is related to a particular adjoint state method, which is stable regardless of asymmetric nudging.

Two Tales of Single-Phase Contrastive Hebbian Learning

TL;DR

The paper tackles the challenge of biologically plausible, fully local gradient learning without phase-based updates, addressing the limitations of infinitesimal nudging and strict symmetry. It introduces Dual Propagation (DP) as a single-inference, local-learning framework that infers two oppositely nudged states per neuron, and strengthens its practicality by deriving a robust DP variant that handles asymmetric nudging. Two complementary theoretical perspectives are developed: a Relaxation Perspective that derives DP from repeated optimal-value reformulations (ROVR, SPROVR, AROVR) and a Lagrangian Perspective that yields DP and connects to LeCun-like formulations. Numerical experiments demonstrate the impact of the nudging parameter on stability and Lipschitz properties, and show scalable performance on MNIST, CIFAR, and VGG16-scale models, highlighting potential for on-device neuromorphic training. The work thus advances both the theoretical understanding and practical viability of fully local, robust gradient learning for energy-efficient hardware.

Abstract

The search for ``biologically plausible'' learning algorithms has converged on the idea of representing gradients as activity differences. However, most approaches require a high degree of synchronization (distinct phases during learning) and introduce substantial computational overhead, which raises doubts regarding their biological plausibility as well as their potential utility for neuromorphic computing. Furthermore, they commonly rely on applying infinitesimal perturbations (nudges) to output units, which is impractical in noisy environments. Recently it has been shown that by modelling artificial neurons as dyads with two oppositely nudged compartments, it is possible for a fully local learning algorithm named ``dual propagation'' to bridge the performance gap to backpropagation, without requiring separate learning phases or infinitesimal nudging. However, the algorithm has the drawback that its numerical stability relies on symmetric nudging, which may be restrictive in biological and analog implementations. In this work we first provide a solid foundation for the objective underlying the dual propagation method, which also reveals a surprising connection with adversarial robustness. Second, we demonstrate how dual propagation is related to a particular adjoint state method, which is stable regardless of asymmetric nudging.
Paper Structure (35 sections, 7 theorems, 77 equations, 3 figures, 4 tables)

This paper contains 35 sections, 7 theorems, 77 equations, 3 figures, 4 tables.

Key Result

Proposition 4.1

Let $0 < \beta \le \beta'$ and $\alpha, \alpha'\in[0,1]$ with $\alpha\le\alpha'$. Then the following holds: (a) $J_{\alpha',\beta}^{\text{AROVR}}(\theta) \le J_{\alpha,\beta}^{\text{AROVR}}(\theta)$ and (b) $\beta J_{\alpha,\beta}^{\text{AROVR}}(\theta) \le \beta' J_{\alpha,\beta'}^{\text{AROVR}}(\t

Figures (3)

  • Figure 1: (a) Illustration of a dyadic neuron (note that all quantities are scalar). The two internal states, $s^+$ and $s^-$, receive the same bottom-up input $a$ but the top down input $\Delta$ nudges them in opposite directions. The difference and weighted mean of these internal states are then propagated downstream and upstream respectively. (b) In a pyramidal neuron bottom-up signal arrive at the basal dendrites and top-down signal arrive at the apical dendrites. Concerns regarding DP and biological plausibility are discussed in section \ref{['sec:bioplausibility']}.
  • Figure 2: CIFAR100: Angle of gradient estimates relative to BP gradients, for DP$^\top$ with $\alpha=0$ and $\beta=0.01$ and $\beta=1.0$. Angles are plotted across layers and epochs (left column). The right column zooms in on the first 200 minibatches (i.e. a fifth of an epoch).
  • Figure 3: MNIST experiments employing asymetric $\alpha$. Results are averaged over five random seeds. Top: Alignment between the parameter updates obtained with back-propagation and with the improved DP variant (using 30 inference iterations and asymmetric nudging with $\alpha\in\{0,1\}$). Middle: L2 norm of difference between BP and DP gradients. Bottom: L2 norms of BP and DP gradients when using 30 inference iterations and asymmetric nudging ($\alpha=0.0$ and $\alpha=1.0$). Results are averaged over five random seeds.

Theorems & Definitions (11)

  • Proposition 4.1
  • Proposition 4.2
  • Corollary 4.3
  • Proposition 5.1
  • proof
  • Proposition D.1
  • Lemma D.2
  • proof
  • proof : Proof of the proposition
  • Proposition E.1
  • ...and 1 more