Table of Contents
Fetching ...

Architecture Induces Structural Invariant Manifolds of Neural Network Training Dynamics

Jiajie Zhao, Tao Luo, Yaoyu Zhang

TL;DR

The paper develops an analytic, geometry-based framework to understand how neural-network architecture shapes training dynamics via Structural Invariant Manifolds (SIMs). By framing SIMs as unions of orbits of the induced vector-field family 𝔉 and invoking geometric control theory (Hermann–Nagano), it shows that nontrivial SIMs arise from architectural symmetries, and that for generic two-layer networks all SIMs are symmetry-induced. This yields a principled view of how neuron condensation and reduced-width equivalences emerge as intrinsic dynamical constraints, independent of data or loss, with implications for global trajectory tracing and potential ties to generalization. The framework unifies previous symmetry/conservation insights and provides a path to quantify how architecture constrains training dynamics and, ultimately, generalization in nonlinear networks.

Abstract

While architecture is recognized as key to the performance of deep neural networks, its precise effect on training dynamics has been unclear due to the confounding influence of data and loss functions. This paper proposed an analytic framework based on the geometric control theory to characterize the dynamical properties intrinsic to a model's parameterization. We prove that the Structural Invariant Manifolds (SIMs) of an analytic model $F(\mathbfθ)(\mathbf{x})$--submanifolds that confine gradient flow trajectories independent of data and loss--are unions of orbits of the vector field family $\{\nabla_{\mathbfθ} F(\cdot)(\mathbf{x})\mid\mathbf{x}\in\mathbb{R}^d\}$. We then prove that a model's symmetry, e.g., permutation symmetry for neural networks, induces SIMs. Applying this, we characterize the hierarchy of symmetry-induced SIMs in fully-connected networks, where dynamics exhibit neuron condensation and equivalence to reduced-width networks. For two-layer networks, we prove all SIMs are symmetry-induced, closing the gap between known symmetries and all possible invariants. Overall, by establishing the framework for analyzing SIMs induced by architecture, our work paves the way for a deeper analysis of neural network training dynamics and generalization in the near future.

Architecture Induces Structural Invariant Manifolds of Neural Network Training Dynamics

TL;DR

The paper develops an analytic, geometry-based framework to understand how neural-network architecture shapes training dynamics via Structural Invariant Manifolds (SIMs). By framing SIMs as unions of orbits of the induced vector-field family 𝔉 and invoking geometric control theory (Hermann–Nagano), it shows that nontrivial SIMs arise from architectural symmetries, and that for generic two-layer networks all SIMs are symmetry-induced. This yields a principled view of how neuron condensation and reduced-width equivalences emerge as intrinsic dynamical constraints, independent of data or loss, with implications for global trajectory tracing and potential ties to generalization. The framework unifies previous symmetry/conservation insights and provides a path to quantify how architecture constrains training dynamics and, ultimately, generalization in nonlinear networks.

Abstract

While architecture is recognized as key to the performance of deep neural networks, its precise effect on training dynamics has been unclear due to the confounding influence of data and loss functions. This paper proposed an analytic framework based on the geometric control theory to characterize the dynamical properties intrinsic to a model's parameterization. We prove that the Structural Invariant Manifolds (SIMs) of an analytic model --submanifolds that confine gradient flow trajectories independent of data and loss--are unions of orbits of the vector field family . We then prove that a model's symmetry, e.g., permutation symmetry for neural networks, induces SIMs. Applying this, we characterize the hierarchy of symmetry-induced SIMs in fully-connected networks, where dynamics exhibit neuron condensation and equivalence to reduced-width networks. For two-layer networks, we prove all SIMs are symmetry-induced, closing the gap between known symmetries and all possible invariants. Overall, by establishing the framework for analyzing SIMs induced by architecture, our work paves the way for a deeper analysis of neural network training dynamics and generalization in the near future.

Paper Structure

This paper contains 22 sections, 22 theorems, 29 equations, 1 figure.

Key Result

Theorem 2.1

(Hermann--Nagano Theorem, Theorem 6 in Section 2 of jurdjevic1997geometric) Let $\mathcal{M}$ be an analytic manifold, and $\mathcal{F}$ a family of analytic vector fields on $\mathcal{M}$. Then:

Figures (1)

  • Figure 1: Flowchart of the paper's logical structure, illustrating the progression from Section \ref{['sec:SIM']}, Section \ref{['sec:symmetry']}, to Section \ref{['sec:orbit']}. Grey blocks represent foundational results from prior work. Red blocks denote the paper's main theorems. Green blocks represent propositions, blue blocks represent lemmas, and yellow blocks represent corollaries.

Theorems & Definitions (78)

  • Definition 2.1: analytic parametric model
  • Definition 2.2: vector field induced invariant set (manifold)
  • Definition 2.3: multi-layer fully-connected neural network
  • Definition 2.4: two-layer neural network
  • Definition 2.5: symmetry group
  • Definition 2.6: orbit, page 33 of jurdjevic1997geometric
  • Definition 2.7: Lie closure
  • Theorem 2.1
  • Corollary 2.1
  • Definition 3.1: structural invariant manifold (SIM)
  • ...and 68 more