Table of Contents
Fetching ...

Improving equilibrium propagation without weight symmetry through Jacobian homeostasis

Axel Laborieux, Friedemann Zenke

Abstract

Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.

Improving equilibrium propagation without weight symmetry through Jacobian homeostasis

Abstract

Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.
Paper Structure (30 sections, 37 equations, 5 figures, 4 tables)

This paper contains 30 sections, 37 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Different neural network architectures and their Jacobian after the free phase. a) A continuous layered Hopfield network. The existence of an energy function enforces Jacobian symmetry with reciprocal and tied connectivity $w_{ij}=w_{ji}$. b) A layered network with reciprocal connectivity but independent forward and backward connections, which make the Jacobian not symmetric. c) A discrete feed forward network without feedback connections. The Jacobian is lower triangular, allowing the explicit back propagation of error layer by layer.
  • Figure 2: Separating the sources of bias due to finite nudge and Jacobian asymmetry. a,b) Cosine similarity between the neuronal error vector $\boldsymbol{\delta}$ computed by RBP and classic EP (grey) scellier2017equilibriumscellier2018generalization, holomorphic EP (purple and pink) for different number of points to estimate Eq. \ref{['eq:dudbeta_cauchy']}, in function of the teaching amplitude $|\beta|$, for symmetric (a)) and non symmetric (b)) equilibrium Jacobian. c,d) Kernel density estimate of the cosine distribution between both neuronal error vectors in a two-hidden layer MLP. Alignment worsens with increasing asymmetry and depth. e) Continuous-time estimate of the free fixed point through oscillations induced by the teaching signal $\beta(t)$. f) Continuous-time estimate of the neuronal error vector of generalized hEP.
  • Figure 3: Comparison of training dynamics of a two-hidden-layer MLP on Fashion MNIST using RBP (top row) and generalized hEP (bottom row). a,b) Evolution of the angle between forward and backward connections during training on for varying initial angle. Training tends to reduce weight symmetry. c,d) Learning curves. The training loss is dependent on initial angle only for generalized hEP. e,f) Evolution of the cosine similarity between neuronal error vectors of RBP and hEP over training. Curves are averaged over three seeds and shaded areas denote $\pm$ 1 stddev.
  • Figure 4: The homeostatic loss improves hEP training of arbitrary dynamical systems by acting directly on the Jacobian. Each row corresponds to a specific architecture-dataset pair. a) Multi-layer architecture where layers are reciprocally connected. Evolution of the validation error b), Jacobian symmetry measure c), and layer-wise cosine between neuronal error vectors of hEP and RBP d) during training of Fashion MNIST. e-h) Same plots as a-d for an architecture where the output layer directly feeds back to the first layer. i) Recurrent convolutional architecture used on CIFAR-10. Evolution of validation error j), angle between weights k) and layer-wise cosine between the neuronal error vectors of hEP and RBP l). Curves are averaged over five random seeds for Fashion MNIST and three for CIFAR-10 and shaded areas denote $\pm$ 1 standard error.
  • Figure 5: The same experiment as Fig. \ref{['fig:homeo']}, but run on a Predictive Coding Network whittington2019theories. a) Architecture where each layer consists of value neurons (circles) and error neurons (squares). Evolution during training of the validation error b), Jacobian symmetry measure c), and Homeostatic loss d).

Theorems & Definitions (1)

  • Definition 1: Nudge parameter $\beta$