Table of Contents
Fetching ...

Extraction Propagation

Stephen Pasteris, Chris Hicks, Vasilios Mavroudis

TL;DR

Extraction Propagation introduces exnets, DAG-structured ensembles of small neural nets that replace end-to-end backpropagation with forward-generated extractions to update parameters. The core method, Xprop, performs an up pass to compute primary extractions, a down pass to compute complementary extractions, and gradient-based updates to all component networks, with theoretical analysis under convergence that reveals vertex optimality and identical local predictions. The paper also showcases architectural templates (tree-structured, multi-layer, and attention-based exnets), enhancements like supernodes, and an Xprop* variant to stabilize training. If implemented and validated, this approach could offer modular, scalable, and potentially more robust training dynamics for large neural systems by decoupling local learning from global gradient flow.

Abstract

Running backpropagation end to end on large neural networks is fraught with difficulties like vanishing gradients and degradation. In this paper we present an alternative architecture composed of many small neural networks that interact with one another. Instead of propagating gradients back through the architecture we propagate vector-valued messages computed via forward passes, which are then used to update the parameters. Currently the performance is conjectured as we are yet to implement the architecture. However, we do back it up with some theory. A previous version of this paper was entitled "Fusion encoder networks" and detailed a slightly different architecture.

Extraction Propagation

TL;DR

Extraction Propagation introduces exnets, DAG-structured ensembles of small neural nets that replace end-to-end backpropagation with forward-generated extractions to update parameters. The core method, Xprop, performs an up pass to compute primary extractions, a down pass to compute complementary extractions, and gradient-based updates to all component networks, with theoretical analysis under convergence that reveals vertex optimality and identical local predictions. The paper also showcases architectural templates (tree-structured, multi-layer, and attention-based exnets), enhancements like supernodes, and an Xprop* variant to stabilize training. If implemented and validated, this approach could offer modular, scalable, and potentially more robust training dynamics for large neural systems by decoupling local learning from global gradient flow.

Abstract

Running backpropagation end to end on large neural networks is fraught with difficulties like vanishing gradients and degradation. In this paper we present an alternative architecture composed of many small neural networks that interact with one another. Instead of propagating gradients back through the architecture we propagate vector-valued messages computed via forward passes, which are then used to update the parameters. Currently the performance is conjectured as we are yet to implement the architecture. However, we do back it up with some theory. A previous version of this paper was entitled "Fusion encoder networks" and detailed a slightly different architecture.
Paper Structure (18 sections, 63 equations, 2 figures)

This paper contains 18 sections, 63 equations, 2 figures.

Figures (2)

  • Figure 1: The primary architecture when $\mathcal{I}=[8]$ and the exnet is a balanced tree. The subscript of $t$ has been dropped from all vectors.
  • Figure 2: Extraction computation and parameter updates at a vertex $v$ with a single parent $z$, noting that $\mu'_{t}(v)=\mu^\dag_{t}(z,v)$. The subscript $t$ has been dropped from all extractions. The left hand side depicts the vertices and extractions involved. The right hand side depicts the neural networks involved as well as how the extractions are computed. The neural networks are updated by backpropagation from the (gradient of the loss of the) prediction denoted by the green arrow. Note that blue and red indicate primary and complementary extractions respectively.