Learning efficient backprojections across cortical hierarchies in real time

Kevin Max; Laura Kriener; Garibaldi Pineda García; Thomas Nowotny; Ismael Jaras; Walter Senn; Mihai A. Petrovici

Learning efficient backprojections across cortical hierarchies in real time

Kevin Max, Laura Kriener, Garibaldi Pineda García, Thomas Nowotny, Ismael Jaras, Walter Senn, Mihai A. Petrovici

TL;DR

PAL addresses the cortical credit assignment problem by introducing a phaseless, online learning rule that uses intrinsic noise and prospective coding to learn feedback weights in layered hierarchies. The backward weights $\bm{B}_{\ell,\ell+1}$ converge toward alignment with the transpose of forward weights $[\bm{W}_{\ell+1,\ell}]^T$, enabling BP-like error propagation without weight transport or wake-sleep phases. Across cortical microcircuits and deep-net benchmarks, PAL outperforms fixed random feedback and DFA on several tasks, and it supports scalable credit assignment in deep networks with competitive latent representations. The approach highlights a general principle of leveraging noise as a learning resource in physical substrates, with implications for biologically inspired neuroengineering and neuromorphic hardware.

Abstract

Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.

Learning efficient backprojections across cortical hierarchies in real time

TL;DR

converge toward alignment with the transpose of forward weights

, enabling BP-like error propagation without weight transport or wake-sleep phases. Across cortical microcircuits and deep-net benchmarks, PAL outperforms fixed random feedback and DFA on several tasks, and it supports scalable credit assignment in deep networks with competitive latent representations. The approach highlights a general principle of leveraging noise as a learning resource in physical substrates, with implications for biologically inspired neuroengineering and neuromorphic hardware.

Abstract

Paper Structure (27 sections, 36 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 36 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Results
Learning of efficient backprojections
Cortical microcircuit implementation
Experiments
Phaseless backwards weight alignment
Teacher-student setup
Classification experiments
Efficient credit assignment in deep networks
Discussion
Methods
Prospective Coding
Alignment of feedback weights
Dendritic cortical microcircuits
Simulation details
...and 12 more sections

Figures (9)

Figure 1: Sensory processing over cortical hierarchies.a: Brain areas in the visual pathway beyond the primary visual cortex (V1). Information is propagated to higher areas (red arrows) such as V2, V4, the medial temporal (MT) area, and beyond. In order to assign credit, feedback information from higher level areas needs to be propagated top-down (blue arrows). Adapted from archer2020temporalvisual_stream. b: Pyramidal cells as functional units of sensory processing and credit assignment. Top-down and bottom-up projections preferentially target different dendrites. Due to stochastic dynamics of individual neurons, noise is added to the signal.
Figure 1: Alternative regularizer with derivative shows further improvement in alignment. We reproduce the experiment in Fig. \ref{['fig:bw_learning']} (e) using the same parameters: microcircuits learning to adapt backwards weights with PAL using (a) the standard weight decay regularizer and (b) the derivative-dependent regularizer of Eq. \ref{['eq:varphi_regularizer']}.
Figure 2: Cortical microcircuit setup with one hidden layer.Left: Full network with pyramidal cells and interneurons. Triangles represent somata of pyramidal neurons, with attached basal and apical compartments. Interneuron somata (circle) receive input from a single dendritic compartment and a nudging signal from a matching pyramidal cell in the layer above. Right: Single microcircuit. Somatic voltages contain bottom-up data signal, top-down error, and noise. The top-down synapses adapted with PAL are marked with a star.
Figure 2: PAL outperforms FA not simply due to inclusion of noise. A key difference between the experiments performed with PAL and FA is the inherent modeling of noise. Therefore, it could be argued that FA with noise may perform on par with PAL. To test this, we reproduce the Yin-Yang experiment of Fig. \ref{['fig:teacher_student']} (c,d). PAL without learning of top-down weights ($\eta^\text{bw}=0$) is equivalent to FA with noise, which performs similar to vanilla FA and is still outperformed by PAL.
Figure 3: PAL aligns weight updates with backpropagation in deep networks.a, b, c: We train the backward projections in a deep microcircuit network with layer sizes [5-20-10-20-5] and sigmoid activation with no target present. All backward weights $\bm B^\text{PP}_{\ell,\ell+1}$ are learned simultaneously, while forward weights are fixed. Lines and shading show mean and standard deviation over 10 5 seeds. Weights are initialized as $\bm W^\text{PP} \sim \mathcal{U}[-1,1]$, such that neurons are activated in their linear regime. The right column compares the potential forward weight updates generated from backpropagation using $\bm B^\text{PP}_{\ell, \ell+1}$ in the microcircuit model to those in an ANN with BP (see main text and Methods), where the instructive signal is provided by a teacher network with arbitrary forward weight configuration. d, e, f: Same as above, but with weights initialized in non-linear regime, $\bm W^\text{PP} \sim \mathcal{U}[-5,5]$. Weight updates (f) are biased towards misalignment due to the dendritic microcircuit model, see Methods.
...and 4 more figures

Learning efficient backprojections across cortical hierarchies in real time

TL;DR

Abstract

Learning efficient backprojections across cortical hierarchies in real time

Authors

TL;DR

Abstract

Table of Contents

Figures (9)