Table of Contents
Fetching ...

Brain-inspired Computational Intelligence via Predictive Coding

Tommaso Salvatori, Ankur Mali, Christopher L. Buckley, Thomas Lukasiewicz, Rajesh P. N. Rao, Karl Friston, Alexander Ororbia

TL;DR

This survey analyzes predictive coding (PC) as a brain-inspired alternative to backpropagation, focusing on locality and asynchronous updates and the ability to operate on arbitrary topologies. It frames PC as variational free energy minimization, linking posterior inference to learning in hierarchical Gaussian generative models and presenting the ELBO $\mathcal{L}(\theta,\mathbf{o})$ as a central objective. It catalogs PC implementations (including PC, Neural Generative Coding, and convolutional variants) and surveys applications from supervised learning to active inference, while candidly discussing scalability and hardware challenges. It then outlines future directions for efficiency, optimization, stochastic generation, and control architectures, and argues for the potential of PC in energy-efficient, neuromorphic AI and cognitive-system design.

Abstract

Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with a learning algorithm called error backpropagation, always considered biologically implausible. To this end, recent works have studied learning algorithms for deep neural networks inspired by the neurosciences. One such theory, called predictive coding (PC), has shown promising properties that make it potentially valuable for the machine learning community: it can model information processing in different areas of the brain, can be used in control and robotics, has a solid mathematical foundation in variational inference, and performs its computations asynchronously. Inspired by such properties, works that propose novel PC-like algorithms are starting to be present in multiple sub-fields of machine learning and AI at large. Here, we survey such efforts by first providing a broad overview of the history of PC to provide common ground for the understanding of the recent developments, then by describing current efforts and results, and concluding with a large discussion of possible implications and ways forward.

Brain-inspired Computational Intelligence via Predictive Coding

TL;DR

This survey analyzes predictive coding (PC) as a brain-inspired alternative to backpropagation, focusing on locality and asynchronous updates and the ability to operate on arbitrary topologies. It frames PC as variational free energy minimization, linking posterior inference to learning in hierarchical Gaussian generative models and presenting the ELBO as a central objective. It catalogs PC implementations (including PC, Neural Generative Coding, and convolutional variants) and surveys applications from supervised learning to active inference, while candidly discussing scalability and hardware challenges. It then outlines future directions for efficiency, optimization, stochastic generation, and control architectures, and argues for the potential of PC in energy-efficient, neuromorphic AI and cognitive-system design.

Abstract

Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with a learning algorithm called error backpropagation, always considered biologically implausible. To this end, recent works have studied learning algorithms for deep neural networks inspired by the neurosciences. One such theory, called predictive coding (PC), has shown promising properties that make it potentially valuable for the machine learning community: it can model information processing in different areas of the brain, can be used in control and robotics, has a solid mathematical foundation in variational inference, and performs its computations asynchronously. Inspired by such properties, works that propose novel PC-like algorithms are starting to be present in multiple sub-fields of machine learning and AI at large. Here, we survey such efforts by first providing a broad overview of the history of PC to provide common ground for the understanding of the recent developments, then by describing current efforts and results, and concluding with a large discussion of possible implications and ways forward.
Paper Structure (37 sections, 12 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 37 sections, 12 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Generative models effectively compress information about a specific data point $\mathbf{o}$, with missing information, into a low-dimensional code vector (or latent embedding), and use it to generate a (e.g., semantically) similar, complete data point. Left: the standard encoder-decoder model used in machine learning kingma2013auto; Right: an equivalent PC model, which iteratively computes and refines the code through a free-energy minimization process. At convergence, the code is then used to generate a data point via the same model used to perform the compression. Typically, PC associates the encoder with ascending (prediction error) messages and the decoder with descending (prediction) messages. Note that the implicit conflation of the encoding and decoding means that there is only one set of parameters in PC. These are the parameters of the generative model that, crucially, can be optimized locally, given the requisite predictions and prediction errors.
  • Figure 2: A timeline of how the perception of what PC is has changed through the years. Initially, it was developed as a signal compression mechanism elias1955predictiveelias1955predictiveII; then, it was used to model inhibition in the retina srinivasan1982. It then became a more general model of both learning and perception in the visual cortex rao1999predictive. Nowadays, it can be abstractly defined as an evidence-maximization scheme for hierarchical Gaussian generative models friston2005theoryfriston2009predictive. For a detailed discussion on different PC algorithms, we also refer to another survey spratling2017review
  • Figure 3: (a) The difference between PC and standard models in terms of locality: backprop updates its synaptic weights $\mathbf{W}^{\ell}_{i,j}$ to minimize the output error, even if it is not directly connected to it. PC models, on the other hand, perform their updates to correct the error of their postsynaptic neuron. (b) PC can be used to train models with cycles. An example is the fully connected model on the left, where every pair of neurons is connected via two different synapses, one in each direction. Being able to train models like this facilitates the inversion of models with arbitrarily expressive architectures (such as realistic brain structures) by simply masking specific connections of a fully connected model via an adjacency matrix. For an example of interesting models that operate on realistic brain structures, see right panel, taken from avena18.
  • Figure 4: A graphical depiction of PC in an unsupervised generative form (left) and in a supervised discriminative form (right). Note that the generative form of PC entails iteratively inferring a latent code (or embedding) $\mathbf{x}^L$ for sensory input $\mathbf{x}^0 = \mathbf{o}$, while the discriminative form of PC requires iteratively learning a predictive mapping between sensory input $\mathbf{x}^L = \mathbf{o}$ to target $\mathbf{x}^0 = \mathbf{y}$.
  • Figure 5: On the left is an arbitrary neural circuit (non-linearities are omitted for simplicity) that could be constructed under one of the three PC frameworks surveyed. Rao & Ballard PC (PC) only employs generative synapses $\mathbf{W}^{\ell}$, while NGC further employs separate feedback synapses (which may or may not correspond symmetrically to the generative pathway). Note that NGC/BC-DIM notably often employ skip connections ($\mathbf{W}^7$).
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition : Informal