Table of Contents
Fetching ...

Function Space and Critical Points of Linear Convolutional Networks

Kathlén Kohn, Guido Montúfar, Vahid Shahverdi, Matthew Trager

TL;DR

This work analyzes the geometry of function spaces realized by one-dimensional linear convolutional networks (LCNs) with arbitrary strides. By encoding end-to-end filters as sparse factorizations of homogeneous polynomials, the authors characterize the function space $\mathcal{M}_{\mathbf{k},\mathbf{s}}$ and its Zariski closure, derive its dimension $\dim\mathcal{M}_{\mathbf{k},\mathbf{s}} = \sum_{i=1}^L k_i - (L-1)$, and delineate thick/thin, Zariski-closed/non-closed, and smooth/singular regimes. They provide a complete description of the critical points of the parameterization map via polynomial factorization and hyperroot structure, show that for reduced architectures with all strides $>1$ and generic data the nonzero squared-loss critical points are interior smooth points (i.e., pure) of the function space, and develop a projective/birational framework that connects multi-layer LCNs to stride-one sub-architectures. The results reveal a rich algebraic structure behind LCNs, with implications for architecture design and optimization behavior that differ substantially from stride-one or dense linear networks. Overall, the paper advances a precise, algebraic understanding of how architecture shapes the geometry of representable functions and the optimization landscape.

Abstract

We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one.

Function Space and Critical Points of Linear Convolutional Networks

TL;DR

This work analyzes the geometry of function spaces realized by one-dimensional linear convolutional networks (LCNs) with arbitrary strides. By encoding end-to-end filters as sparse factorizations of homogeneous polynomials, the authors characterize the function space and its Zariski closure, derive its dimension , and delineate thick/thin, Zariski-closed/non-closed, and smooth/singular regimes. They provide a complete description of the critical points of the parameterization map via polynomial factorization and hyperroot structure, show that for reduced architectures with all strides and generic data the nonzero squared-loss critical points are interior smooth points (i.e., pure) of the function space, and develop a projective/birational framework that connects multi-layer LCNs to stride-one sub-architectures. The results reveal a rich algebraic structure behind LCNs, with implications for architecture design and optimization behavior that differ substantially from stride-one or dense linear networks. Overall, the paper advances a precise, algebraic understanding of how architecture shapes the geometry of representable functions and the optimization landscape.

Abstract

We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one.
Paper Structure (7 sections, 35 theorems, 41 equations, 1 figure, 2 tables)

This paper contains 7 sections, 35 theorems, 41 equations, 1 figure, 2 tables.

Key Result

Proposition 2.2

The function space of the LCN architecture $(\bm k, \bm s)$can be identified with the following subset of $\mathbb{R}^k$: Here, $\pi_s$ is the map from eq:polynomials. Equivalently, $\mathcal{M}_{\bm k, \bm s}$ is the image of the parameterization map

Figures (1)

  • Figure 1: Left: Slice of the semi-algebraic set $AD^2+B^2E-BCD=0$, $C^2-4AE\ge 0$, obtained by setting $A=1$ and $C=-1$. This set corresponds to the function space $\mathcal{M}_{(3,2),(2,1)}\subseteq \mathbb{R}^5$ in Example \ref{['ex:reduced-(3,2)']}. Right: The same set intersected with $B^4-4AB(BC-AD)\geq 0$, $D^4-4DE(CD-BE)\ge 0$ and ($AE\le 0$ or $AC\le 0$). This intersection corresponds to the function space $\mathcal{M}_{(2,2,2),(1,2,1)}$ discussed in Example \ref{['ex:k222s121']}. The reduced boundary points and the stride-one boundary points are depicted as a blue point and a black dashed curve, respectively; see Theorem \ref{['thm:boundaryProperties']}.

Theorems & Definitions (83)

  • Definition 2.1
  • Proposition 2.2: LCN
  • Example 2.3
  • Theorem 2.4
  • Definition 2.5
  • Example 2.6
  • Theorem 2.7
  • Remark 2.8
  • Theorem 2.9
  • Theorem 2.10
  • ...and 73 more