Separation Power of Equivariant Neural Networks

Marco Pacini; Xiaowen Dong; Bruno Lepri; Gabriele Santin

Separation Power of Equivariant Neural Networks

Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

TL;DR

This work analyzes the separation power of finite-group equivariant neural networks, reframing input distinguishability as a zero-locus problem via the twin network trick. It proves that any continuous, non-polynomial activation yields maximal and equivalent separation power, while depth increases separation only up to a finite threshold, and width or invariant hidden features do not affect separation. It also introduces a hierarchical view of separation power tied to representation type and subgroup structure, showing minimal representations yield lower capacity and the regular representation yields maximal separation. The results are connected to practical models, demonstrating how IGNs can match WL power under suitable choices, and how CNNs’ separation depends on filter size, with clear guidance for architecture design under symmetry constraints.

Abstract

The separation power of a machine learning model refers to its ability to distinguish between different inputs and is often used as a proxy for its expressivity. Indeed, knowing the separation power of a family of models is a necessary condition to obtain fine-grained universality results. In this paper, we analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks. We first present a complete characterization of inputs indistinguishable by models derived by a given architecture. From this results, we derive how separability is influenced by hyperparameters and architectural choices-such as activation functions, depth, hidden layer width, and representation types. Notably, all non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power. Depth improves separation power up to a threshold, after which further increases have no effect. Adding invariant features to hidden representations does not impact separation power. Finally, block decomposition of hidden representations affects separability, with minimal components forming a hierarchy in separation power that provides a straightforward method for comparing the separation power of models.

Separation Power of Equivariant Neural Networks

TL;DR

Abstract

Paper Structure (38 sections, 37 theorems, 163 equations, 1 figure)

This paper contains 38 sections, 37 theorems, 163 equations, 1 figure.

Introduction
Contributions.
Related Work
The Relevance of Separability
Separation-constrained Universality
The Effect of Hyperparameters on Separability and Universality
Preliminaries
Groups and Equivariance
Equivariant Neural Networks
Main Results
The Twin Network Trick
The Characterization Theorem
The Role of Activations
The Role of Depth
The Role of Intermediate Representations
...and 23 more sections

Key Result

Proposition 1

Let $\mathbb{R}^X$ be a permutation representation of $G$ with orbit decomposition $X_1 \sqcup \cdots \sqcup X_n$ (see Definition def:orbits in Appendix section:group-action), let $Y \subseteq X$. Define $\mathbbm{1}_Y = \sum_{y \in Y} e_y \in \mathbb{R}^X$. The invariant subspace of $\mathbb{R}^X=\

Figures (1)

Figure 1: The twin network trick illustrated. Evaluating two copies of $\eta$ on $\alpha$ and $\beta$, and subtracting the resulting outputs, is equivalent to evaluating the twin network $\overline \eta$ on $(\alpha, \beta)$.

Theorems & Definitions (96)

Definition 1
Definition 2: Point-wise Activation
Definition 3: Neural Networks and Neural Spaces
Example 1: Equivariant Neural Networks
Proposition 1
Example 2: Invariant Graph Networks
Example 3: Circular Convolutional Neural Networks
Proposition 2
Theorem 1: Informal
Theorem 2
...and 86 more

Separation Power of Equivariant Neural Networks

TL;DR

Abstract

Separation Power of Equivariant Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (96)