Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

Davide Casnici; Martin Lefebvre; Justin Dauwels; Charlotte Frenkel

Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

Davide Casnici, Martin Lefebvre, Justin Dauwels, Charlotte Frenkel

TL;DR

This work tackles the biological plausibility and hardware efficiency concerns of backpropagation by advancing predictive coding (PC) with Direct Kolen-Pollack Predictive Coding (DKP-PC). By introducing learnable direct feedback from the output layer to all hidden layers, it removes both error-delay and exponential decay that plague PC, reducing backward-time complexity from $O(L)$ to $O(1)$ while preserving locality. DKP-PC combines direct feedback alignment and KP-inspired learning, enabling a single, effective inference step to achieve training performance that matches or exceeds standard PC and rivals BP on a range of networks and datasets, including VGG-like CNNs on Tiny ImageNet. The approach also demonstrates substantial gains in training speed and energy efficiency, highlighting its potential for neuromorphic hardware and on-chip learning, with opportunities for further optimization via custom kernels and feedback-weight sparsity/quantization.

Abstract

Predictive coding (PC) is a biologically inspired algorithm for training neural networks that relies only on local updates, allowing parallel learning across layers. However, practical implementations face two key limitations: error signals must still propagate from the output to early layers through multiple inference-phase steps, and feedback decays exponentially during this process, leading to vanishing updates in early layers. We propose direct Kolen-Pollack predictive coding (DKP-PC), which simultaneously addresses both feedback delay and exponential decay, yielding a more efficient and scalable variant of PC while preserving update locality. Leveraging direct feedback alignment and direct Kolen-Pollack algorithms, DKP-PC introduces learnable feedback connections from the output layer to all hidden layers, establishing a direct pathway for error transmission. This yields an algorithm that reduces the theoretical error propagation time complexity from O(L), with L being the network depth, to O(1), removing depth-dependent delay in error signals. Moreover, empirical results demonstrate that DKP-PC achieves performance at least comparable to, and often exceeding, that of standard PC, while offering improved latency and computational performance, supporting its potential for custom hardware-efficient implementations.

Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

TL;DR

while preserving locality. DKP-PC combines direct feedback alignment and KP-inspired learning, enabling a single, effective inference step to achieve training performance that matches or exceeds standard PC and rivals BP on a range of networks and datasets, including VGG-like CNNs on Tiny ImageNet. The approach also demonstrates substantial gains in training speed and energy efficiency, highlighting its potential for neuromorphic hardware and on-chip learning, with opportunities for further optimization via custom kernels and feedback-weight sparsity/quantization.

Abstract

Paper Structure (21 sections, 2 theorems, 67 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 2 theorems, 67 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Background
Backpropagation
Direct Kolen-Pollack Feedback Alignment
Predictive Coding
Methodology
Feedback error delay and decay
Direct Kolen-Pollack Predictive Coding
Results
Conclusion and Future Work
Appendix
Convergence of Feedback Matrices under the Direct Kolen–Pollack Algorithm
Error propagation delay
Error exponential decay
Theoretical and Empirical Analysis of DKP-PC Integration
...and 6 more sections

Key Result

Theorem 1.1

Consider a forward-initialized PC network with discrete-time updates. Assuming an incorrect prediction, the neural activity $\phi_\ell$ at layer $\ell$ requires at least $\hat{t} = L - \ell$ inference-phase steps before it deviates from equilibrium and begins to evolve according to Eq. eq:neural_act

Figures (6)

Figure 1: DKP-PC embeds DKP within the PC framework to address the error feedback delay and exponential decay issues of PC. Blue arrows represent forward connections, red arrows represent feedback connections. Neural activities are shown as gray circles, with clamped values in darker gray; $x_0$ denotes the input and $y$ the target. $\mathcal{L}$ is the loss function, $\delta_\ell$ are the BP errors, $\tilde{\delta}_\ell$ their approximations, and $\epsilon_\ell$ the PC error neurons, represented as triangles. (A) BP propagates the global error sequentially. (B) DFA and DKP propagate the error directly from the output to each layer. (C) PC minimizes local errors through an inference phase, followed by a learning phase that updates weights. (D) DKP-PC employs direct feedback to deliver instantaneous error signals to every layer, accelerating error propagation during the PC inference phase while preserving the locality of weight updates.
Figure 2: Error propagation in PC (A) and DKP-PC (B) during the inference phase of a VGG-9 network trained on a single CIFAR-10 batch, at different magnitudes of the neural activity learning rate $\gamma$. In (A), PC exhibits both an error decay problem, where the error magnitude decreases exponentially with network depth, and an error delay problem, as the error signal flows through the network sequentially, undermining the theoretical parallelism. White colour represents values equal to zero or below the numerical precision. In (B), DKP-PC mitigates both issues, generating a more uniform error signal across all layers at the start of neural activity optimization.
Figure 3: Forward weight gradients alignment across layers of a VGG9-like CNN trained for 50 epochs on CIFAR-100. Each curve shows the cosine similarity between the instantaneous forward-weight gradient produced by DKP and DKP-PC algorithms, compared to the one computed with BP. All gradients exclude weight decay and momentum and are smoothed using an exponential moving average with a window of 100 batches. DKP (brown) displays positive but slow alignment with BP, progressively deteriorating with increasing distance from the output layer. DKP-PC (yellow) gradient is computed as sum of the gradients resulting from DKP and PC stages. It achieves consistently faster, higher, and more stable alignment across all layers compared to standard DKP. The light blue curve shows that disabling the PC forward-weight update in DKP-PC causes alignment to collapse in all layers, confirming its role in injecting alignment information into the forward weights. The blue curve, obtained by disabling the feedback-weight update in DKP-PC, demonstrates that the alignment and regularization terms introduced by the PC stage also improve the update of the feedback matrices, resulting in worse alignment when disabled.
Figure 4: Energy evolution of four-layer MLP networks on a Fashion-MNIST batch for three different neural activity learning rate magnitudes. DKP-PC and its incremental variant are shown in blue and light blue, respectively, while standard PC and iPC are represented in brown and yellow. Both DKP-PC variants start from higher energy values due to the immediate error term at every layer, and converge to levels similar to those of the standard PC and iPC networks. Although DKP-PC exhibits slower convergence due to the additional terms in the neural activity dynamics, its incremental version equals the convergence speed of iPC across all evaluated learning rates, suggesting that updating forward parameters during inference effectively compensates for the additional complexity introduced by DKP.
Figure 5: Test accuracy distributions over 30 trials are shown as a function of the total number of neural activity optimization steps. Blue boxplots correspond to a four-layer MLP trained on Fashion-MNIST with DKP-PC, while light blue boxplots show the same architecture trained with iDKP-PC. In line with PC theory, both methods display a positive correlation between the number of optimization steps and the final test accuracy.
...and 1 more figures

Theorems & Definitions (4)

Theorem 1.1: Error propagation delay
proof
Theorem 1.2: Error exponential decay
proof

Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

TL;DR

Abstract

Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)