Table of Contents
Fetching ...

Algebraic Complexity and Neurovariety of Linear Convolutional Networks

Vahid Shahverdi

TL;DR

This work analyzes the algebraic structure and optimization complexity of one-dimensional linear convolutional networks (1D-LCNs) by embedding their parameter space into algebraic varieties. It develops a recursive polynomial framework and a resultant-based method to capture the Zariski closure of the neuromanifold as a complex neurovariety, linking the network factorization to Segre varieties. The main finding is that the number of complex critical points in training equals the generic Euclidean distance degree of the associated Segre variety, $\mathrm{gEDdeg}(\mathcal{S}_{\mathbf{k}})$, which can substantially exceed the corresponding count for fully connected linear networks. The paper also shows that this ED-degree grows with depth through a closed-form functional $C_{\mathbf{k}}$, and provides an algorithm to compute the vanishing ideal of the neurovariety via s-decomposition and polynomial resultants. Overall, it illuminates the substantial algebraic complexity of training 1D-LCNs and explains why depth can amplify optimization difficulty beyond parameter-count considerations.

Abstract

In this paper, we study linear convolutional networks with one-dimensional filters and arbitrary strides. The neuromanifold of such a network is a semialgebraic set, represented by a space of polynomials admitting specific factorizations. Introducing a recursive algorithm, we generate polynomial equations whose common zero locus corresponds to the Zariski closure of the corresponding neuromanifold. Furthermore, we explore the algebraic complexity of training these networks employing tools from metric algebraic geometry. Our findings reveal that the number of all complex critical points in the optimization of such a network is equal to the generic Euclidean distance degree of a Segre variety. Notably, this count significantly surpasses the number of critical points encountered in the training of a fully connected linear network with the same number of parameters.

Algebraic Complexity and Neurovariety of Linear Convolutional Networks

TL;DR

This work analyzes the algebraic structure and optimization complexity of one-dimensional linear convolutional networks (1D-LCNs) by embedding their parameter space into algebraic varieties. It develops a recursive polynomial framework and a resultant-based method to capture the Zariski closure of the neuromanifold as a complex neurovariety, linking the network factorization to Segre varieties. The main finding is that the number of complex critical points in training equals the generic Euclidean distance degree of the associated Segre variety, , which can substantially exceed the corresponding count for fully connected linear networks. The paper also shows that this ED-degree grows with depth through a closed-form functional , and provides an algorithm to compute the vanishing ideal of the neurovariety via s-decomposition and polynomial resultants. Overall, it illuminates the substantial algebraic complexity of training 1D-LCNs and explains why depth can amplify optimization difficulty beyond parameter-count considerations.

Abstract

In this paper, we study linear convolutional networks with one-dimensional filters and arbitrary strides. The neuromanifold of such a network is a semialgebraic set, represented by a space of polynomials admitting specific factorizations. Introducing a recursive algorithm, we generate polynomial equations whose common zero locus corresponds to the Zariski closure of the corresponding neuromanifold. Furthermore, we explore the algebraic complexity of training these networks employing tools from metric algebraic geometry. Our findings reveal that the number of all complex critical points in the optimization of such a network is equal to the generic Euclidean distance degree of a Segre variety. Notably, this count significantly surpasses the number of critical points encountered in the training of a fully connected linear network with the same number of parameters.
Paper Structure (16 sections, 9 theorems, 43 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 16 sections, 9 theorems, 43 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Proposition 2.2

If $\pi_1$ is the map defined in eq:polynomials and $\pi_1^\mathbb{C}$ is its complex counterpart, then:

Figures (1)

  • Figure 1: The top row displays the value of $C_{2,3,4,5}$ for a reduced $4$-layer architecture. Subsequent rows showcase the computation of various $C_{\mathbf{k}}$ values for architectures derived by merging two layers at the parent node while preserving the dimension of the neuromanifold.

Theorems & Definitions (19)

  • Definition 2.1
  • Proposition 2.2: kohn2023function, Proposition 3.1
  • Definition 2.3
  • Lemma 2.4
  • proof
  • Definition 2.5
  • Definition 3.1
  • Proposition 3.2
  • proof
  • Corollary 3.3
  • ...and 9 more