Table of Contents
Fetching ...

On the relationship between predictive coding and backpropagation

Robert Rosenbaum

TL;DR

This manuscript reviews and extends recent work on the mathematical relationship between predictive coding and backpropagation for training feedforward artificial neural networks on supervised learning tasks and discusses a repository of functions, Torch2PC, for performing predictive coding with PyTorch neural network models.

Abstract

Artificial neural networks are often interpreted as abstract models of biological neuronal networks, but they are typically trained using the biologically unrealistic backpropagation algorithm and its variants. Predictive coding has been proposed as a potentially more biologically realistic alternative to backpropagation for training neural networks. This manuscript reviews and extends recent work on the mathematical relationship between predictive coding and backpropagation for training feedforward artificial neural networks on supervised learning tasks. Implications of these results for the interpretation of predictive coding and deep neural networks as models of biological learning are discussed along with a repository of functions, Torch2PC, for performing predictive coding with PyTorch neural network models.

On the relationship between predictive coding and backpropagation

TL;DR

This manuscript reviews and extends recent work on the mathematical relationship between predictive coding and backpropagation for training feedforward artificial neural networks on supervised learning tasks and discusses a repository of functions, Torch2PC, for performing predictive coding with PyTorch neural network models.

Abstract

Artificial neural networks are often interpreted as abstract models of biological neuronal networks, but they are typically trained using the biologically unrealistic backpropagation algorithm and its variants. Predictive coding has been proposed as a potentially more biologically realistic alternative to backpropagation for training neural networks. This manuscript reviews and extends recent work on the mathematical relationship between predictive coding and backpropagation for training feedforward artificial neural networks on supervised learning tasks. Implications of these results for the interpretation of predictive coding and deep neural networks as models of biological learning are discussed along with a repository of functions, Torch2PC, for performing predictive coding with PyTorch neural network models.

Paper Structure

This paper contains 16 sections, 1 theorem, 53 equations, 9 figures, 3 algorithms.

Key Result

Theorem 1

If Algorithm A:ModPC is run with step size $\eta=1$ and at least $n=L$ iterations then the algorithm computes and for all $\ell=1,\ldots,L$ where $\hat{v}_\ell=f_\ell(\hat{v}_{\ell-1};\theta_\ell)$ are the results from a forward pass with $\hat{v}_0=x$ and $\hat{y}=\hat{v}_L=f(x;\theta)$ is the output.

Figures (9)

  • Figure 1: Comparing backpropagation and predictive coding in a convolutional neural network trained on MNIST.A,B) The loss (A) and accuracy (B) on the training set (pastel) and test set (dark) when a 5-layer network was trained using a strict implementation of predictive coding (Algorithm \ref{['A:PC']} with $\eta=0.1$ and $n=20$; red) and backpropagation (blue). C,D) The relative error (C) and angle (B) between the parameter update, $d\theta$, computed by Algorithm \ref{['A:PC']} and the negative gradient of the loss at each layer. Predictive coding and backpropagation give similar accuracies, but the parameter updates are less similar.
  • Figure 2: Comparing parameter updates from predictive coding to true gradients in a network trained on MNIST. Relative error and angle between $d\theta_\ell$ produced by predictive coding (Algorithm \ref{['A:PC']}) as compared to the exact gradients, $\partial {\mathcal{L}}/\partial \theta_\ell$ computed by backpropagation (relative error defined by $\|d\theta_{pc}-d\theta_{bp}\|/\|d\theta_{bp}\|$). Updates were computed as a function of the number of iterations, $n$, used in Algorithm \ref{['A:PC']} for various values of the step size, $\eta$, using the model from Fig. \ref{['F:PC']} applied to one mini-batch of data. Both models were initialized identically to the pre-trained parameter values from the trained model in Fig. \ref{['F:PC']}. Parameter updates converge near the gradients after many iterations for smaller values of $\eta$, but diverge for larger values.
  • Figure 3: Predictive coding modified by the fixed prediction assumption compared to backpropagation in a convolutional neural network trained on MNIST. Same as Fig. \ref{['F:PC']} except Algorithm \ref{['A:ModPC']} was used (with $\eta=0.1$ and $n=20$) in place of Algorithm \ref{['A:PC']}. The accuracy of predictive coding with the fixed prediction assumption is similar to backpropagation, but the parameter updates are less similar for these hyperparameters.
  • Figure 4: Comparing parameter updates from predictive coding modified by the fixed prediction assumption to true gradients in a network trained on MNIST. Relative error and angle between $d\theta$ produced by predictive coding modified by the fixed prediction assumption (Algorithm \ref{['A:ModPC']}) as compared to the exact gradients computed by backpropagation (relative error defined by $\|d\theta_{pc}-d\theta_{bp}\|/\|d\theta_{bp}\|$). Updates were computed as a function of the number of iterations, $n$, used in Algorithm \ref{['A:ModPC']} for various values of the step size, $\eta$, using the model from Fig. \ref{['F:ModPC']} applied to one mini-batch of data. Both models were initialized identically to the pre-trained parameter values from the backpropagation-trained model in Fig. \ref{['F:ModPC']}. In the rightmost panels, some lines are not visible where they overlap at zero. Parameter updates quickly converge to the true gradients when $\eta$ is larger.
  • Figure 5: Predictive coding modified by the fixed prediction assumption compared to backpropagation in convolutional neural networks trained on CIFAR-10. Same as Fig. \ref{['F:ModPC']} except a larger network was trained on the CIFAR-10 data set. The accuracy of predictive coding with the fixed prediction assumption is similar to backpropagation and parameter updates are similar to the true gradients.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof