Table of Contents
Fetching ...

Predictive Coding: a Theoretical and Experimental Review

Beren Millidge, Anil Seth, Christopher L Buckley

TL;DR

This review synthesizes predictive coding as a mathematically principled, variational-inference framework that explains cortical function through hierarchical, precision-weighted prediction errors. It connects core theory to neurobiological microcircuits, explores temporal dynamics via generalized coordinates, and analyzes relationships to backpropagation, Kalman filtering, and active inference. The authors survey supervised, unsupervised, relaxed, and deep variants, discuss practical neural-implementation challenges, and outline future directions for both neuroscience and machine learning. Overall, predictive coding emerges as a comprehensive, biologically plausible account with strong implications for understanding perception, action, and learning, while highlighting key open questions in precision, time, memory, and large-scale scalability.

Abstract

Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world. The theory is closely related to the Bayesian brain framework and, over the last two decades, has gained substantial influence in the fields of theoretical and cognitive neuroscience. A large body of research has arisen based on both empirically testing improved and extended theoretical and mathematical models of predictive coding, as well as in evaluating their potential biological plausibility for implementation in the brain and the concrete neurophysiological and psychological predictions made by the theory. Despite this enduring popularity, however, no comprehensive review of predictive coding theory, and especially of recent developments in this field, exists. Here, we provide a comprehensive review both of the core mathematical structure and logic of predictive coding, thus complementing recent tutorials in the literature. We also review a wide range of classic and recent work within the framework, ranging from the neurobiologically realistic microcircuits that could implement predictive coding, to the close relationship between predictive coding and the widely-used backpropagation of error algorithm, as well as surveying the close relationships between predictive coding and modern machine learning techniques.

Predictive Coding: a Theoretical and Experimental Review

TL;DR

This review synthesizes predictive coding as a mathematically principled, variational-inference framework that explains cortical function through hierarchical, precision-weighted prediction errors. It connects core theory to neurobiological microcircuits, explores temporal dynamics via generalized coordinates, and analyzes relationships to backpropagation, Kalman filtering, and active inference. The authors survey supervised, unsupervised, relaxed, and deep variants, discuss practical neural-implementation challenges, and outline future directions for both neuroscience and machine learning. Overall, predictive coding emerges as a comprehensive, biologically plausible account with strong implications for understanding perception, action, and learning, while highlighting key open questions in precision, time, memory, and large-scale scalability.

Abstract

Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world. The theory is closely related to the Bayesian brain framework and, over the last two decades, has gained substantial influence in the fields of theoretical and cognitive neuroscience. A large body of research has arisen based on both empirically testing improved and extended theoretical and mathematical models of predictive coding, as well as in evaluating their potential biological plausibility for implementation in the brain and the concrete neurophysiological and psychological predictions made by the theory. Despite this enduring popularity, however, no comprehensive review of predictive coding theory, and especially of recent developments in this field, exists. Here, we provide a comprehensive review both of the core mathematical structure and logic of predictive coding, thus complementing recent tutorials in the literature. We also review a wide range of classic and recent work within the framework, ranging from the neurobiologically realistic microcircuits that could implement predictive coding, to the close relationship between predictive coding and the widely-used backpropagation of error algorithm, as well as surveying the close relationships between predictive coding and modern machine learning techniques.

Paper Structure

This paper contains 28 sections, 67 equations, 4 figures.

Figures (4)

  • Figure 1: Architecture of a multi-layer predictive coding network (here shown with two value and error neurons in each layer. The value neurons $\mu$ project to both the error neurons of the layer below (representing the top down connections) and the error neurons at the current layer to represent the current activity. The error neurons receive inhibitory top down inputs from the value neurons of the layer above and excitatory inputs from the value neurons at the same layer. Conversely, the value neurons receive excitatory projections from the error neurons of the layer below and inhibitory from the error neurons at the current layer. Crucially, for this model with its explicit error neurons, all synaptic plasticity rules are purely Hebbian.
  • Figure 2: The canonical microcircuit proposed by Bastos et al mapped onto the laminar connectivity of a cortical region (which comprises 6 layers). Here, for simplicity, we group layers L2 and L3 together into a broad 'superficial' layer and L5 and L6 together into a 'deep' layer. We ignore L1 entirely since there are few neurons there and they are not involved in the Bastos microcircuit. Bold lines are included in the canonoical microcircuit of Bastos et al. Dashed lines are connections which are known to exist in the cortex which are not explained by the model. Red text denotes the values which are computed in each part of the canonical microcircuit
  • Figure 3: Summary of the input output relationships for each paradigm of predictive coding. Specifically a.) What the input to the network is and b.) what the network is trained to predict.
  • Figure 4: Schematic architectures for the a.) Standard, or generative predictive coding setup, or b.) Reverse, or discriminative architecture trained for supervised classification on MNIST digits. In the generative model, the image input (in this case an MNIST digit) is presented to the bottom layer of the network, and the top layer is fixed to the label value (5). Predictions (in black) are passed down and prediction errors (in red) are passed upwards until the network equilibrates. In the discriminative mode, the input image is presented to the top of the network and the label is presented at the bottom. Thus the network aims to 'generate' the label from the image. The top-down flow of predictions becomes analogous to the forward pass in an artificial neural networks, and the bottom-up prediction errors become equivalent to the backpropagated gradients.