Table of Contents
Fetching ...

On the Geometry and Optimization of Polynomial Convolutional Networks

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

TL;DR

This work analyzes CNNs with monomial activations through the lens of algebraic geometry, showing that after removing filter-scaling symmetries the parameterization is regular and generically one-to-one (birational) with finite fibers. It identifies the neuromanifold as closely related to Segre--Veronese varieties, deriving its dimension $ ext{dim}( ext{Neuromanifold}) = |oldsymbol{k}| - L + 1$ and degree $ ext{deg}( ext{Neuromanifold}) = (|oldsymbol{k}|-L)!\prod_{j=0}^{L-1} rac{r^{(L-j-1)(k_j-1)}}{(k_j-1)!}$ for $r>1$, and characterizing singularities as nodal points arising from subnetworks. The authors connect optimization to distance-minimization on the neuromanifold and compute the generic Euclidean distance degree, yielding a dataset-independent count of complex critical points for large generic datasets. These results illuminate the expressivity and learning dynamics of polynomial CNNs and suggest pathways for extending algebraic-geometric methods to broader network architectures and activation functions.

Abstract

We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

On the Geometry and Optimization of Polynomial Convolutional Networks

TL;DR

This work analyzes CNNs with monomial activations through the lens of algebraic geometry, showing that after removing filter-scaling symmetries the parameterization is regular and generically one-to-one (birational) with finite fibers. It identifies the neuromanifold as closely related to Segre--Veronese varieties, deriving its dimension and degree for , and characterizing singularities as nodal points arising from subnetworks. The authors connect optimization to distance-minimization on the neuromanifold and compute the generic Euclidean distance degree, yielding a dataset-independent count of complex critical points for large generic datasets. These results illuminate the expressivity and learning dynamics of polynomial CNNs and suggest pathways for extending algebraic-geometric methods to broader network architectures and activation functions.

Abstract

We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.
Paper Structure (24 sections, 16 theorems, 31 equations, 3 figures, 1 table)

This paper contains 24 sections, 16 theorems, 31 equations, 3 figures, 1 table.

Key Result

Theorem 3.1

The generic Euclidean Distance degree of the Segre--Veronese variety is: where $|\mathbf{p}| = p_1 + \cdots + p_{k}$.

Figures (3)

  • Figure 1: Illustration of a Segre--Veronese variety parametrizing CNNs.
  • Figure 2: Distance function from an anchor to a curve, visualized as a color gradient. The critical values are denoted by dotted lines.
  • Figure 3: Visualization of two-dimensional charts of neuromanifolds (over $\mathbb{R}$) corresponding to $(k_0,k_1)=(2,2)$ projected orthogonally to $\mathbb{R}^3$, with varying activation degree.

Theorems & Definitions (46)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 3.1: kozhasov2023minimal
  • Lemma 4.1
  • proof
  • Corollary 4.2
  • proof
  • Remark 4.1
  • ...and 36 more