Neural Networks as Local-to-Global Computations

Vicente Bosca; Robert Ghrist

Neural Networks as Local-to-Global Computations

Vicente Bosca, Robert Ghrist

Abstract

We construct a cellular sheaf from any feedforward ReLU neural network by placing one vertex for each intermediate quantity in the forward pass and encoding each computational step - affine transformation, activation, output - as a restriction map on an edge. The restricted coboundary operator on the free coordinates is unitriangular, so its determinant is $1$ and the restricted Laplacian is positive definite for every activation pattern. It follows that the relative cohomology vanishes and the forward pass output is the unique harmonic extension of the boundary data. The sheaf heat equation converges exponentially to this output despite the state-dependent switching introduced by piecewise linear activations. Unlike the forward pass, the heat equation propagates information bidirectionally across layers, enabling pinned neurons that impose constraints in both directions, training through local discrepancy minimization without a backward pass, and per-edge diagnostics that decompose network behavior by layer and operation type. We validate the framework experimentally on small synthetic tasks, confirming the convergence theorems and demonstrating that sheaf-based training, while not yet competitive with stochastic gradient descent, obeys quantitative scaling laws predicted by the theory.

Neural Networks as Local-to-Global Computations

Abstract

and the restricted Laplacian is positive definite for every activation pattern. It follows that the relative cohomology vanishes and the forward pass output is the unique harmonic extension of the boundary data. The sheaf heat equation converges exponentially to this output despite the state-dependent switching introduced by piecewise linear activations. Unlike the forward pass, the heat equation propagates information bidirectionally across layers, enabling pinned neurons that impose constraints in both directions, training through local discrepancy minimization without a backward pass, and per-edge diagnostics that decompose network behavior by layer and operation type. We validate the framework experimentally on small synthetic tasks, confirming the convergence theorems and demonstrating that sheaf-based training, while not yet competitive with stochastic gradient descent, obeys quantitative scaling laws predicted by the theory.

Paper Structure (44 sections, 6 theorems, 50 equations, 21 figures, 3 tables)

This paper contains 44 sections, 6 theorems, 50 equations, 21 figures, 3 tables.

Introduction
Background
Feedforward ReLU Networks
Cellular Sheaves and Sheaf Diffusion
The Neural Sheaf Construction
From Networks to Sheaves
The Forward Pass as Harmonic Extension
Convergence Analysis
Convergence for ReLU Networks
Convergence with Final Activations
Extensions
Pinned Neurons
Training via Joint Dynamics
Evolving restriction maps
The joint dynamics
...and 29 more sections

Key Result

Theorem 2.2

Solutions to the sheaf heat equation converge exponentially to the orthogonal projection of $x(0)$ onto $H^0(G; \mathcal{F})$. The rate of convergence is governed by the smallest positive eigenvalue of $L_\mathcal{F}$.

Figures (21)

Figure 1: Neural sheaf encoding a $k$-hidden layer ReLU network. Red nodes are stubborn (boundary conditions); green and yellow nodes are dynamic, with yellow nodes having a fixed component. Restriction maps $\overline{W}^{(\ell)}$ encode weights and biases, $R^{z^{(\ell)}}$ encodes ReLU activation, $P_{n_\ell}$ projects onto the first $n_\ell$ coordinates, and $\phi$ is the final activation function.
Figure 2: The neural network--sheaf correspondence. Red nodes represent fixed data (boundary conditions); green and yellow nodes are computed, with yellow nodes having a fixed component. (a) The feedforward network computes postactivations $\mathbf{a}^{(1)} = \mathrm{ReLU}(W^{(1)}\mathbf{x} + b^{(1)})$ and output $\hat{\mathbf{y}} = \phi(W^{(2)}\mathbf{a}^{(1)} + b^{(2)})$ via the forward pass. (b) In the sheaf formulation, only $\overline{\mathbf{x}}$ (input extended with ones) is pinned. Restriction maps encode weights and biases ($\overline{W}^{(1)}, \overline{W}^{(2)}$), ReLU activations ($R^{z^{(1)}}$), and final activation ($\phi$). The heat equation drives any initialization to the correct forward pass output.
Figure 3: Pinning a neuron. (a) Fixing the third hidden neuron to the value $\mathbf{p}$ in the feedforward network affects only the output neuron through the red edge. (b) In the sheaf formulation, the pinned value $\mathbf{p}$ is imposed through the restriction map $P_{\{3\}}$, which projects onto the third component of $a^{(1)}$. The disturbance propagates to both adjacent stalks $z^{(1)}$ and $z^{(2)}$ (red edges).
Figure 4: Two training configurations for a 1-hidden-layer network. Red nodes are pinned boundary conditions, green nodes are dynamic pre-activation variables, and yellow nodes are dynamic post-activation variables with some part fixed. Green arrows represent trainable weight matrices. Left: Regression with identity activation composed with gradient of nonlinear potential $\nabla U$, enabling other losses beyond L2, like L1, Lp, or Huber. Right: Classification with Softmax/Sigmoid activation and cross-entropy loss, where dynamics simplify to allow a sheaf-like construction.
Figure 5: Batch processing for a one-hidden-layer network with batch size $M$. Top: the batch sheaf with matrix-valued stalks, where ReLU acts via coordinatewise masking. Bottom: the equivalent representation as $M$ parallel activation edges, each carrying its own diagonal ReLU matrix $R^{z^{(1),m}}$ and projection $P_{n_1,m}$. The weight edges are shared across all inputs.
...and 16 more figures

Theorems & Definitions (17)

Remark 1: Piecewise linear structure
Definition 2.1
Remark 2: Opinion dynamics interpretation
Theorem 2.2: Convergence of sheaf diffusion hansen2021opinion
Theorem 2.3: $U$-restricted dynamics hansen2021opinion
Remark 3: State-dependent restriction maps
Lemma 1: Unitriangular factorization
proof
Remark 4
Proposition 1: Forward pass as harmonic extension
...and 7 more

Neural Networks as Local-to-Global Computations

Abstract

Neural Networks as Local-to-Global Computations

Authors

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (17)