Table of Contents
Fetching ...

Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups

Kathlén Kohn, Anna-Laura Sattelberger, Vahid Shahverdi

TL;DR

The paper addresses the geometry of the function space of linear neural networks under permutation symmetries by modeling it as determinantal varieties and studying invariant and equivariant subvarieties. It provides a complete algebraic description of invariant maps for arbitrary permutation groups and of equivariant maps for cyclic groups, showing these spaces decompose into direct products of determinantal varieties and have rich irreducible structures. Key contributions include a weight-sharing encoder design that parameterizes invariant functions, a multi-component (often many) decomposition for equivariant functions with per-component autoencoder parameterizations, and a concrete link between squared-error loss and Euclidean distance minimization via Eckart–Young, including the notion of squared-error degree. The results yield principled constraints for designing invariant or equivariant linear networks, yield insights for potential nonlinear extensions, and offer practical demonstrations on MNIST, with implications for reducing training cost and informing graph-based symmetry-aware models.

Abstract

The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or $90^\circ$ rotations on images. We describe such equivariant or invariant subvarieties as direct products of determinantal varieties, from which we deduce their dimension, degree, Euclidean distance degree, and their singularities. We fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups. We draw conclusions for the parameterization and the design of equivariant and invariant linear networks in terms of sparsity and weight-sharing properties. We prove that all invariant linear functions can be parameterized by a single linear autoencoder with a weight-sharing property imposed by the cycle decomposition of the considered permutation. The space of rank-bounded equivariant functions has several irreducible components, so it can not be parameterized by a single network-but each irreducible component can. Finally, we show that minimizing the squared-error loss on our invariant or equivariant networks reduces to minimizing the Euclidean distance from determinantal varieties via the Eckart-Young theorem.

Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups

TL;DR

The paper addresses the geometry of the function space of linear neural networks under permutation symmetries by modeling it as determinantal varieties and studying invariant and equivariant subvarieties. It provides a complete algebraic description of invariant maps for arbitrary permutation groups and of equivariant maps for cyclic groups, showing these spaces decompose into direct products of determinantal varieties and have rich irreducible structures. Key contributions include a weight-sharing encoder design that parameterizes invariant functions, a multi-component (often many) decomposition for equivariant functions with per-component autoencoder parameterizations, and a concrete link between squared-error loss and Euclidean distance minimization via Eckart–Young, including the notion of squared-error degree. The results yield principled constraints for designing invariant or equivariant linear networks, yield insights for potential nonlinear extensions, and offer practical demonstrations on MNIST, with implications for reducing training cost and informing graph-based symmetry-aware models.

Abstract

The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or rotations on images. We describe such equivariant or invariant subvarieties as direct products of determinantal varieties, from which we deduce their dimension, degree, Euclidean distance degree, and their singularities. We fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups. We draw conclusions for the parameterization and the design of equivariant and invariant linear networks in terms of sparsity and weight-sharing properties. We prove that all invariant linear functions can be parameterized by a single linear autoencoder with a weight-sharing property imposed by the cycle decomposition of the considered permutation. The space of rank-bounded equivariant functions has several irreducible components, so it can not be parameterized by a single network-but each irreducible component can. Finally, we show that minimizing the squared-error loss on our invariant or equivariant networks reduces to minimizing the Euclidean distance from determinantal varieties via the Eckart-Young theorem.
Paper Structure (22 sections, 31 theorems, 102 equations, 8 figures, 1 table)

This paper contains 22 sections, 31 theorems, 102 equations, 8 figures, 1 table.

Key Result

Lemma 2.1

If $\mathop{\mathrm{rank}}\nolimits (XX^\top) = n$, then

Figures (8)

  • Figure 1: A fully-connected network of depth $2$.
  • Figure 2: The $\sigma$-weight-sharing property imposed on the encoder by $\sigma=(1 \, 3 \,4)(2 \, 5)$.
  • Figure 3: Weight-sharing of the encoder and decoder matrices from \ref{['ex:weightequi']}. Edges of the same color share the same weight---and differ by sign, in case one of the edges is dashed. To avoid an overload of colors, we here visualized the weight-sharing for the encoder only; the decoder follows the same rules, but would require additional seven color shades. Due to the zero blocks in \ref{['eq:matrixblocksequi']}, the $4$th and $5$th input and output neurons are inactive.
  • Figure 4: Top row: Nine samples from the MNIST MNIST test dataset, shifted horizontally randomly by up to six pixels. Middle row: Output of a linear equivariant autoencoder designed to be equivariant under horizontal translations. The network architecture is determined by the integer vector $\mathbf{r}$, as described in \ref{['eq:order_r']}. Bottom row: Output of a dense linear autoencoder with bottleneck $r=99$ and no equivariance imposed.
  • Figure 5: The error incurred by the block $M_i$, $i =0, \ldots, 14$, when setting $\mathop{\mathrm{rank}}\nolimits(M_i)=0 .$
  • ...and 3 more figures

Theorems & Definitions (61)

  • Lemma 2.1
  • proof : Proof of \ref{['lem:squaredErrorED']}
  • Theorem 2.2: Eckart--Young
  • Definition 2.3
  • Lemma 2.4
  • proof
  • Lemma 2.6
  • proof
  • Lemma 2.7
  • proof
  • ...and 51 more