Table of Contents
Fetching ...

Dynamical Learning in Deep Asymmetric Recurrent Neural Networks

Davide Badalotti, Carlo Baldassi, Marc Mézard, Mattia Scardecchia, Riccardo Zecchina

TL;DR

The paper addresses learning in recurrent networks with asymmetric couplings by introducing a gradient-free scheme that stabilizes a densely connected Representation Manifold (RM) of fixed points. Learning proceeds via transient supervision that biases dynamics into the RM, followed by local Hebbian-like plasticity that locks in the final configuration, enabling input–output mappings without backpropagation. Analyses based on local entropy and replica methods reveal conditions under which the RM emerges and remains robust; experiments on MNIST-family benchmarks show performance comparable to backpropagation with the same parameter count. The work links dynamical accessibility of fixed points to learning, offering a biologically plausible mechanism with potential implications for neuromorphic computation and understanding brain-like information processing.

Abstract

We investigate recurrent neural networks with asymmetric interactions and demonstrate that the inclusion of self-couplings or sparse excitatory inter-module connections leads to the emergence of a densely connected manifold of dynamically accessible stable configurations. This representation manifold is exponentially large in system size and is reachable through simple local dynamics, despite constituting a subdominant subset of the global configuration space. We further show that learning can be implemented directly on this structure via a fully local, gradient-free mechanism that selectively stabilizes a single task-relevant network configuration. Unlike error-driven or contrastive learning schemes, this approach does not require explicit comparisons between network states obtained with and without output supervision. Instead, transient supervisory signals bias the dynamics toward the representation manifold, after which local plasticity consolidates the attained configuration, effectively shaping the latent representation space. Numerical evaluations on standard image classification benchmarks indicate performance comparable to that of multilayer perceptrons trained using backpropagation. More generally, these results suggest that the dynamical accessibility of fixed points and the stabilization of internal network dynamics constitute viable alternative principles for learning in recurrent systems, with conceptual links to statistical physics and potential implications for biologically motivated and neuromorphic computing architectures.

Dynamical Learning in Deep Asymmetric Recurrent Neural Networks

TL;DR

The paper addresses learning in recurrent networks with asymmetric couplings by introducing a gradient-free scheme that stabilizes a densely connected Representation Manifold (RM) of fixed points. Learning proceeds via transient supervision that biases dynamics into the RM, followed by local Hebbian-like plasticity that locks in the final configuration, enabling input–output mappings without backpropagation. Analyses based on local entropy and replica methods reveal conditions under which the RM emerges and remains robust; experiments on MNIST-family benchmarks show performance comparable to backpropagation with the same parameter count. The work links dynamical accessibility of fixed points to learning, offering a biologically plausible mechanism with potential implications for neuromorphic computation and understanding brain-like information processing.

Abstract

We investigate recurrent neural networks with asymmetric interactions and demonstrate that the inclusion of self-couplings or sparse excitatory inter-module connections leads to the emergence of a densely connected manifold of dynamically accessible stable configurations. This representation manifold is exponentially large in system size and is reachable through simple local dynamics, despite constituting a subdominant subset of the global configuration space. We further show that learning can be implemented directly on this structure via a fully local, gradient-free mechanism that selectively stabilizes a single task-relevant network configuration. Unlike error-driven or contrastive learning schemes, this approach does not require explicit comparisons between network states obtained with and without output supervision. Instead, transient supervisory signals bias the dynamics toward the representation manifold, after which local plasticity consolidates the attained configuration, effectively shaping the latent representation space. Numerical evaluations on standard image classification benchmarks indicate performance comparable to that of multilayer perceptrons trained using backpropagation. More generally, these results suggest that the dynamical accessibility of fixed points and the stabilization of internal network dynamics constitute viable alternative principles for learning in recurrent systems, with conceptual links to statistical physics and potential implications for biologically motivated and neuromorphic computing architectures.

Paper Structure

This paper contains 24 sections, 102 equations, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: A multilayer chain model with sparse inter module excitatory couplings, and input/output layers.
  • Figure 2: Test accuracy on benchmark datasets as a function of the number of trainable parameters, controlled by the layers width. Our model (red) is compared to a binarized 3-layer perceptron trained with backpropagation (blue). For each value of trainable parameters, the average test accuracy over 5 distinct runs is plotted, together with its standard deviation.
  • Figure 3: Test accuracy of a 2-layer chain model on Entangled MNIST as function of $\lambda$, $J_D$. For each curve, one parameter is varied while the other is kept equal to 0.
  • Figure 4: Internal entropy (divided by $L$) for the Core Model (solid green) and two-layer model (dashed/dot-dashed blue/orange). Core Model: $J_D=0.05,0.07,0.1$ (bottom to top). Two-layer: blue ($J_D=0.05$) with $\lambda=0.2,0.3,0.4$ (bottom to top); orange ($J_D=0$) with $\lambda=0.4,0.45,0.5$ (bottom to top). All curves vanish at distance $0$ (not shown due to numerical limits). In all cases, increasing $J_D$ or $\lambda$ drives the OGP–RM transition.
  • Figure 5: Annealed entropy $S_a$ of the number of foxed points, in the case of two core modules, $L=2$, and $\lambda_L=\lambda_R=\lambda$. Left: $S_a$ is plotted versus $J_D$, for $\lambda=0,0.5,1,1.5$. Right: $S_a$ is plotted versus $\lambda$, for $J_D=0,0.125,0.25$
  • ...and 10 more figures