Function Space and Critical Points of Linear Convolutional Networks
Kathlén Kohn, Guido Montúfar, Vahid Shahverdi, Matthew Trager
TL;DR
This work analyzes the geometry of function spaces realized by one-dimensional linear convolutional networks (LCNs) with arbitrary strides. By encoding end-to-end filters as sparse factorizations of homogeneous polynomials, the authors characterize the function space $\mathcal{M}_{\mathbf{k},\mathbf{s}}$ and its Zariski closure, derive its dimension $\dim\mathcal{M}_{\mathbf{k},\mathbf{s}} = \sum_{i=1}^L k_i - (L-1)$, and delineate thick/thin, Zariski-closed/non-closed, and smooth/singular regimes. They provide a complete description of the critical points of the parameterization map via polynomial factorization and hyperroot structure, show that for reduced architectures with all strides $>1$ and generic data the nonzero squared-loss critical points are interior smooth points (i.e., pure) of the function space, and develop a projective/birational framework that connects multi-layer LCNs to stride-one sub-architectures. The results reveal a rich algebraic structure behind LCNs, with implications for architecture design and optimization behavior that differ substantially from stride-one or dense linear networks. Overall, the paper advances a precise, algebraic understanding of how architecture shapes the geometry of representable functions and the optimization landscape.
Abstract
We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one.
