Table of Contents
Fetching ...

Towards Biologically Plausible Deep Learning

Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin

TL;DR

The paper tackles the credit assignment problem in deep networks from a biological perspective, arguing that spike-timing dependent plasticity (STDP) can implement gradient-like updates without backpropagation. It develops a variational EM framework with learned approximate inference, interpreted through denoising auto-encoder lenses, and introduces target propagation as a biologically plausible gradient estimator. By structuring deep generative models with layer-wise updates and exploring both joint and latent-variable denoisers, the approach yields improved sampling and learning dynamics, validated on MNIST. The work lays groundwork for biologically plausible learning across supervised, unsupervised, and reinforcement contexts, while outlining key future hurdles for neural implementation.

Abstract

Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation learning, focusing here mostly on unsupervised learning but developing a learning mechanism that could account for supervised, unsupervised and reinforcement learning. The starting point is that the basic learning rule believed to govern synaptic weight updates (Spike-Timing-Dependent Plasticity) arises out of a simple update rule that makes a lot of sense from a machine learning point of view and can be interpreted as gradient descent on some objective function so long as the neuronal dynamics push firing rates towards better values of the objective function (be it supervised, unsupervised, or reward-driven). The second main idea is that this corresponds to a form of the variational EM algorithm, i.e., with approximate rather than exact posteriors, implemented by neural dynamics. Another contribution of this paper is that the gradients required for updating the hidden states in the above variational interpretation can be estimated using an approximation that only requires propagating activations forward and backward, with pairs of layers learning to form a denoising auto-encoder. Finally, we extend the theory about the probabilistic interpretation of auto-encoders to justify improved sampling schemes based on the generative interpretation of denoising auto-encoders, and we validate all these ideas on generative learning tasks.

Towards Biologically Plausible Deep Learning

TL;DR

The paper tackles the credit assignment problem in deep networks from a biological perspective, arguing that spike-timing dependent plasticity (STDP) can implement gradient-like updates without backpropagation. It develops a variational EM framework with learned approximate inference, interpreted through denoising auto-encoder lenses, and introduces target propagation as a biologically plausible gradient estimator. By structuring deep generative models with layer-wise updates and exploring both joint and latent-variable denoisers, the approach yields improved sampling and learning dynamics, validated on MNIST. The work lays groundwork for biologically plausible learning across supervised, unsupervised, and reinforcement contexts, while outlining key future hurdles for neural implementation.

Abstract

Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation learning, focusing here mostly on unsupervised learning but developing a learning mechanism that could account for supervised, unsupervised and reinforcement learning. The starting point is that the basic learning rule believed to govern synaptic weight updates (Spike-Timing-Dependent Plasticity) arises out of a simple update rule that makes a lot of sense from a machine learning point of view and can be interpreted as gradient descent on some objective function so long as the neuronal dynamics push firing rates towards better values of the objective function (be it supervised, unsupervised, or reward-driven). The second main idea is that this corresponds to a form of the variational EM algorithm, i.e., with approximate rather than exact posteriors, implemented by neural dynamics. Another contribution of this paper is that the gradients required for updating the hidden states in the above variational interpretation can be estimated using an approximation that only requires propagating activations forward and backward, with pairs of layers learning to form a denoising auto-encoder. Finally, we extend the theory about the probabilistic interpretation of auto-encoders to justify improved sampling schemes based on the generative interpretation of denoising auto-encoders, and we validate all these ideas on generative learning tasks.

Paper Structure

This paper contains 11 sections, 16 equations, 5 figures, 2 algorithms.

Figures (5)

  • Figure 1: Result of simulation around pre-synaptic spike (time 0) showing indirectly the effect of a change in the rate of change in the post-synaptic voltage, $\dot{V}_j$ on both the average time difference between pre- and post-synaptic spikes (horizontal axis, $\Delta T$) and the average weight change (vertical axis, $\Delta W_{ij}$), when the latter follows Eq. \ref{['eq:delta-w-stdp']}. This corresponds very well to the observed relationship between $\Delta T$ and $\Delta W_{ij}$ in the biological literature.
  • Figure 2: The optimal $h$ for maximizing $p(x|h)$ is $\tilde{h}$ s.t. $g(\tilde{h})=x$. Since the encoder $f$ and decoder $g$ are approximate inverses of each other, their composition makes a small move $\Delta x$. Eq. \ref{['eq:bf-estimator']} is obtained by assuming that by considering an $\tilde{x}$ at $x-\Delta$ and applying $f \circ g$, one would approximately recover $x$, which should be true if the changes are small and the functions smooth (see Lee+Bengio-NIPSDL2014-small for a detailed derivation).
  • Figure 3: MNIST samples generated by GENERATE from Algorithm \ref{['alg:experiment-1']} after training with TRAIN.
  • Figure 4: Increase of $\log p(x, h)$ over 20 iterations of the INFERENCE algorithm \ref{['alg:experiment-1']}, showing that the targetprop updates increase the joint likelihood. The solid red line shows the average and the standard error over the full testset containing 10,000 digits. Dashed lines show $\log p(x,h)$ for individual datapoints.
  • Figure 5: Examples of filling-in (in-painting) missing (initially corrupted) parts of an image. Left: original MNIST test examples. Middle: initial state of the inference, with half of the pixels randomly sampled (with a different corruption pattern in each row of the figure). Right: reconstructions using a variant of the INFERENCE procedure of Algorithm \ref{['alg:experiment-1']} for the case when some inputs are clamped.