Extraction Propagation
Stephen Pasteris, Chris Hicks, Vasilios Mavroudis
TL;DR
Extraction Propagation introduces exnets, DAG-structured ensembles of small neural nets that replace end-to-end backpropagation with forward-generated extractions to update parameters. The core method, Xprop, performs an up pass to compute primary extractions, a down pass to compute complementary extractions, and gradient-based updates to all component networks, with theoretical analysis under convergence that reveals vertex optimality and identical local predictions. The paper also showcases architectural templates (tree-structured, multi-layer, and attention-based exnets), enhancements like supernodes, and an Xprop* variant to stabilize training. If implemented and validated, this approach could offer modular, scalable, and potentially more robust training dynamics for large neural systems by decoupling local learning from global gradient flow.
Abstract
Running backpropagation end to end on large neural networks is fraught with difficulties like vanishing gradients and degradation. In this paper we present an alternative architecture composed of many small neural networks that interact with one another. Instead of propagating gradients back through the architecture we propagate vector-valued messages computed via forward passes, which are then used to update the parameters. Currently the performance is conjectured as we are yet to implement the architecture. However, we do back it up with some theory. A previous version of this paper was entitled "Fusion encoder networks" and detailed a slightly different architecture.
