Two Tales of Single-Phase Contrastive Hebbian Learning
Rasmus Kjær Høier, Christopher Zach
TL;DR
The paper tackles the challenge of biologically plausible, fully local gradient learning without phase-based updates, addressing the limitations of infinitesimal nudging and strict symmetry. It introduces Dual Propagation (DP) as a single-inference, local-learning framework that infers two oppositely nudged states per neuron, and strengthens its practicality by deriving a robust DP$^{\top}$ variant that handles asymmetric nudging. Two complementary theoretical perspectives are developed: a Relaxation Perspective that derives DP from repeated optimal-value reformulations (ROVR, SPROVR, AROVR) and a Lagrangian Perspective that yields DP$^{\top}$ and connects to LeCun-like formulations. Numerical experiments demonstrate the impact of the nudging parameter $\alpha$ on stability and Lipschitz properties, and show scalable performance on MNIST, CIFAR, and VGG16-scale models, highlighting potential for on-device neuromorphic training. The work thus advances both the theoretical understanding and practical viability of fully local, robust gradient learning for energy-efficient hardware.
Abstract
The search for ``biologically plausible'' learning algorithms has converged on the idea of representing gradients as activity differences. However, most approaches require a high degree of synchronization (distinct phases during learning) and introduce substantial computational overhead, which raises doubts regarding their biological plausibility as well as their potential utility for neuromorphic computing. Furthermore, they commonly rely on applying infinitesimal perturbations (nudges) to output units, which is impractical in noisy environments. Recently it has been shown that by modelling artificial neurons as dyads with two oppositely nudged compartments, it is possible for a fully local learning algorithm named ``dual propagation'' to bridge the performance gap to backpropagation, without requiring separate learning phases or infinitesimal nudging. However, the algorithm has the drawback that its numerical stability relies on symmetric nudging, which may be restrictive in biological and analog implementations. In this work we first provide a solid foundation for the objective underlying the dual propagation method, which also reveals a surprising connection with adversarial robustness. Second, we demonstrate how dual propagation is related to a particular adjoint state method, which is stable regardless of asymmetric nudging.
