Table of Contents
Fetching ...

Towards the Training of Deeper Predictive Coding Neural Networks

Chang Qi, Matteo Forasassi, Thomas Lukasiewicz, Tommaso Salvatori

TL;DR

This paper tackles the poor scalability of predictive coding networks to deep architectures by diagnosing energy propagation as the root cause of degraded learning and introducing precision-based mechanisms to balance layer-wise errors. The authors propose time-dependent precision schedules (notably Spiking Precision), a forward-update weight rule, residual-connection buffering with auxiliary neurons, and BatchNorm Freezing to stabilize iterative inference. Together, these algorithmic and architectural changes enable PC and Incremental PC to achieve performance on par with backpropagation on deep networks such as VGG up to 15 layers and ResNet18 on Tiny ImageNet, highlighting potential for energy-efficient, local learning in deep models. The work demonstrates that carefully modulating energy flow and update dynamics can close the gap between PC methods and standard backpropagation in real-world deep learning tasks.

Abstract

Predictive coding networks are neural models that perform inference through an iterative energy minimization process, whose operations are local in space and time. While effective in shallow architectures, they suffer significant performance degradation beyond five to seven layers. In this work, we show that this degradation is caused by exponentially imbalanced errors between layers during weight updates, and by predictions from the previous layers not being effective in guiding updates in deeper layers. Furthermore, when training models with skip connections, the energy propagated by the residuals reaches higher layers faster than that propagated by the main pathway, affecting test accuracy. We address the first issue by introducing a novel precision-weighted optimization of latent variables that balances error distributions during the relaxation phase, the second issue by proposing a novel weight update mechanism that reduces error accumulation in deeper layers, and the third one by using auxiliary neurons that slow down the propagation of the energy in the residual connections. Empirically, our methods achieve performance comparable to backpropagation on deep models such as ResNets, opening new possibilities for predictive coding in complex tasks.

Towards the Training of Deeper Predictive Coding Neural Networks

TL;DR

This paper tackles the poor scalability of predictive coding networks to deep architectures by diagnosing energy propagation as the root cause of degraded learning and introducing precision-based mechanisms to balance layer-wise errors. The authors propose time-dependent precision schedules (notably Spiking Precision), a forward-update weight rule, residual-connection buffering with auxiliary neurons, and BatchNorm Freezing to stabilize iterative inference. Together, these algorithmic and architectural changes enable PC and Incremental PC to achieve performance on par with backpropagation on deep networks such as VGG up to 15 layers and ResNet18 on Tiny ImageNet, highlighting potential for energy-efficient, local learning in deep models. The work demonstrates that carefully modulating energy flow and update dynamics can close the gap between PC methods and standard backpropagation in real-world deep learning tasks.

Abstract

Predictive coding networks are neural models that perform inference through an iterative energy minimization process, whose operations are local in space and time. While effective in shallow architectures, they suffer significant performance degradation beyond five to seven layers. In this work, we show that this degradation is caused by exponentially imbalanced errors between layers during weight updates, and by predictions from the previous layers not being effective in guiding updates in deeper layers. Furthermore, when training models with skip connections, the energy propagated by the residuals reaches higher layers faster than that propagated by the main pathway, affecting test accuracy. We address the first issue by introducing a novel precision-weighted optimization of latent variables that balances error distributions during the relaxation phase, the second issue by proposing a novel weight update mechanism that reduces error accumulation in deeper layers, and the third one by using auxiliary neurons that slow down the propagation of the energy in the residual connections. Empirically, our methods achieve performance comparable to backpropagation on deep models such as ResNets, opening new possibilities for predictive coding in complex tasks.

Paper Structure

This paper contains 47 sections, 7 equations, 7 figures, 20 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Evolution of predictive coding models over multiple time iterations. The green diamond $\tilde{\varepsilon}^{l+1}_T$ refers to the information needed to compute the proposed forward updates. The rest of the figure represents the standard components and mechanisms of a predictive coding network. (b) Visualization of the proposed precision-weighting strategies, where the height of the bar is proportional to the precision at different time steps.
  • Figure 2: Normalized layer-wise energy distribution and accuracy comparison between BP and PCNs in a VGG10 on the CIFAR10 dataset. Colored curves represent the total energy of the individual layers of the model (or, the squared error of every layer for BP). The vertical lines represent the train and test accuracies of the model.
  • Figure 3: Test accuracies of various algorithms on the CIFAR10 dataset, evaluated on models of different depths. From the second plot onward, each pair of bars compares the performance of the algorithm with and without center nudging (CN).
  • Figure 4: (a): A sketch of a residual block (left) and our proposed variation (right) with auxiliary neural activities, that prevent the error signal to travel from $\mathbf{x}^{l+2}$ to $\mathbf{x}^{l-1}$ in one timestep; (b) A barplot showing the gap in test accuracy on the CIFAR10 dataset between models with and without added neural activities on a ResNet18. Both plots refer to the best test accuracies reached by PC and iPC with the novel methods presented above. (c) Shows the test accuracy between models with the standard formulation of BN, without BN, and our proposed BF.
  • Figure 5: Layer-wise Energy Distribution and Accuracy Comparison between PC and Decaying Precision/Spiking Precision with Forward Update in VGG5, VGG7 and VGG10 on the CIFAR10 dataset. The colored lines represent the total energy of the individual layers of the model. The vertical lines represent the train and test accuracies of the model.
  • ...and 2 more figures