Table of Contents
Fetching ...

The Geometry of the Set of Equivalent Linear Neural Networks

Jonathan Richard Shewchuk, Sagnik Bhattacharya

TL;DR

This work provides a comprehensive geometric and topological framework for the fiber mu^{-1}(W) of a linear neural network, i.e., all weight factorizations yielding a fixed linear map W. By introducing a rank-based stratification and a rich system of subspaces (A_{kji}, B_{kji}, prebases, and basis-flow diagrams), the authors characterize tangent and normal spaces to each stratum, connect strata via rank-1 abstract moves, and prove that each stratum is a C^ abla^ty manifold with dimension computable from the rank list. A canonical weight construction and a forward/transpose duality provide a powerful, geometry-driven view of information flow through the network and its impact on optimization paths. The paper also develops a detailed, algorithmic approach to building the stratum DAG, enabling practical exploration of fiber geometry and potential training dynamics improvements. Overall, the rank stratification offers a principled lens for understanding how different network decompositions realize the same linear map and how gradient-based optimization might traverse or avoid spurious critical regions.

Abstract

We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class $C^\infty$ embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces. To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space.

The Geometry of the Set of Equivalent Linear Neural Networks

TL;DR

This work provides a comprehensive geometric and topological framework for the fiber mu^{-1}(W) of a linear neural network, i.e., all weight factorizations yielding a fixed linear map W. By introducing a rank-based stratification and a rich system of subspaces (A_{kji}, B_{kji}, prebases, and basis-flow diagrams), the authors characterize tangent and normal spaces to each stratum, connect strata via rank-1 abstract moves, and prove that each stratum is a C^ abla^ty manifold with dimension computable from the rank list. A canonical weight construction and a forward/transpose duality provide a powerful, geometry-driven view of information flow through the network and its impact on optimization paths. The paper also develops a detailed, algorithmic approach to building the stratum DAG, enabling practical exploration of fiber geometry and potential training dynamics improvements. Overall, the rank stratification offers a principled lens for understanding how different network decompositions realize the same linear map and how gradient-based optimization might traverse or avoid spurious critical regions.

Abstract

We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation . This set of weight vectors is called the fiber of (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces. To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space.
Paper Structure (50 sections, 60 theorems, 184 equations, 16 figures, 7 tables)

This paper contains 50 sections, 60 theorems, 184 equations, 16 figures, 7 tables.

Key Result

Lemma 1

$A_{kji} = W_{j \sim x} A_{kxi}$ for all $k$, $j$, $i$, and $x$ that satisfy $L \geq k$ and $k + 1 \geq j \geq x \geq i \geq 0$. Furthermore, $B_{kji} = W_{y \sim j}^\top B_{kyi}$ for all $k$, $j$, $i$, and $y$ that satisfy $L \geq k \geq y \geq j \geq i - 1$ and $i \geq 0$.

Figures (16)

  • Figure 1: The fiber $\mu^{-1}([1])$ for the network $W_3 W_2 W_1 = [\theta_3] [\theta_2] [\theta_1] = [1] = W$.
  • Figure 2: At left is the fiber $\mu^{-1}([0 ~~~ 0])$ for the network $W_2 W_1 = [\theta_2] [\theta_1 ~~~ \theta'_1] = [0 ~~~ 0] = W$, partitioned into three strata: $S_{00}$ is the origin; $S_{10}$ (blue) is the $\theta_2$-axis with the origin removed; and $S_{01}$ (pink) is the plane spanned by the $\theta_1$- and $\theta'_1$-axes with the origin removed. At right, the strata are arranged in a stratum dag, which we organize as a two-dimensional table indexed by the ranks of $W_1$ and $W_2$. Each dag vertex specifies the dimension of the stratum (dim), the number of degrees of freedom of motion on the fiber (dof), and the number of rank-increasing degrees of freedom (rdof), which stay on the fiber but move off the stratum and onto a higher-dimensional stratum. Always, dof $=$ dim $+$ rdof. A directed path from one stratum to another implies that the former is a subset of the closure of the latter.
  • Figure 3: At left is the variety of solutions to $W_3 W_2 W_1 = [\theta_3] [\theta_2] [\theta_1] = [0] = W$, partitioned into seven strata: $S_{000}$ is the origin; $S_{001}$, $S_{010}$, and $S_{100}$ are the three coordinate axes with the origin removed; and $S_{011}$, $S_{101}$, and $S_{110}$ are the three coordinate planes with the coordinate axes removed. At right is the stratum dag, organized as a three-dimensional table indexed by the ranks of $W_3$, $W_2$, and $W_1$.
  • Figure 4: Stratum dag representing the stratification of $\mu^{-1}(W)$ for $W = W_2 W_1$, $W_2 \in \mathbb{R}^{5 \times 6}$, $W_1 \in \mathbb{R}^{6 \times 4}$, and $\mathrm{rk}\, W = 1$. The dag edges are omitted, but each stratum $S_{ki}$ has an edge pointing to the stratum $S_{k+1,i}$ immediately above it, and another edge pointing to the stratum $S_{k,i+1}$ immediately to its right. For every pair of strata $S_{ki}$ and $S_{k'i'}$ with $k \leq k'$ and $i \leq i'$, $S_{ki} \subset \bar{S}_{k'i'}$.
  • Figure 5: The tea clipper ship Basis Flow. The top half is a basis flow diagram that illustrates the flow of the prebasis subspaces $a_{kji}$ through the network. Double boxes represent subspaces of dimension 2 and triple boxes represent subspaces of dimension 3. The bottom half shows the relationships between the intervals, the layer sizes, and the matrix ranks. The number of units $d_j$ in unit layer $j$ equals the sum of the multiplicities $\omega_{ts}$ of the intervals that touch layer $j$ (i.e., the dimensions of the prebasis subspaces $a_{tjs}$). Each matrix rank $\mathrm{rk}\, W_{k \sim i}$ is the sum of the multiplicities of the intervals that touch both layers $k$ and $i$.
  • ...and 11 more figures

Theorems & Definitions (117)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • ...and 107 more