Curl Descent: Non-Gradient Learning Dynamics with Sign-Diverse Plasticity
Hugo Ninou, Jonathan Kadmon, N. Alex Cayco-Gajic
TL;DR
The paper investigates learning dynamics in neural networks with sign-diverse, non-gradient curl terms arising from biologically plausible plasticity rules. Using a two-layer student–teacher framework, it introduces curl descent updates and analyzes their fixed points, showing that the solution manifold can remain stable under moderate curl, while the origin often becomes a center, depending on architecture. In large networks, random-matrix theory reveals a phase transition where the solution manifold loses stability as the fraction of flipped synapses and compression ratio vary, with distinct thresholds for hidden- vs readout-layer flips. Simulations in linear and nonlinear networks demonstrate chaotic dynamics when curl destabilizes the manifold in the hidden layer, yet in some cases curl terms still yield low error or even faster convergence, highlighting architectures that can robustly leverage non-gradient learning rules. Overall, the work broadens the view of optimization in neural systems and suggests that sign-diverse plasticity can support effective learning beyond traditional gradient descent, with potential implications for both neuroscience and machine learning.
Abstract
Gradient-based algorithms are a cornerstone of artificial neural network training, yet it remains unclear whether biological neural networks use similar gradient-based strategies during learning. Experiments often discover a diversity of synaptic plasticity rules, but whether these amount to an approximation to gradient descent is unclear. Here we investigate a previously overlooked possibility: that learning dynamics may include fundamentally non-gradient "curl"-like components while still being able to effectively optimize a loss function. Curl terms naturally emerge in networks with inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity, resulting in learning dynamics that cannot be framed as gradient descent on any objective. To investigate the impact of these curl terms, we analyze feedforward networks within an analytically tractable student-teacher framework, systematically introducing non-gradient dynamics through neurons exhibiting rule-flipped plasticity. Small curl terms preserve the stability of the original solution manifold, resulting in learning dynamics similar to gradient descent. Beyond a critical value, strong curl terms destabilize the solution manifold. Depending on the network architecture, this loss of stability can lead to chaotic learning dynamics that destroy performance. In other cases, the curl terms can counterintuitively speed learning compared to gradient descent by allowing the weight dynamics to escape saddles by temporarily ascending the loss. Our results identify specific architectures capable of supporting robust learning via diverse learning rules, providing an important counterpoint to normative theories of gradient-based learning in neural networks.
