Quantitative Approximation Rates for Group Equivariant Learning

Jonathan W. Siegel; Snir Hordan; Hannah Lawrence; Ali Syed; Nadav Dym

Quantitative Approximation Rates for Group Equivariant Learning

Jonathan W. Siegel, Snir Hordan, Hannah Lawrence, Ali Syed, Nadav Dym

TL;DR

Overall, it is shown that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions, and hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Abstract

The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of $α$-Hölder functions $f: [0,1]^N \to \mathbb{R}$. The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned $α$-Hölder function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Quantitative Approximation Rates for Group Equivariant Learning

TL;DR

Abstract

-Hölder functions

. The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned

-Hölder function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Paper Structure (23 sections, 21 theorems, 210 equations, 1 table)

This paper contains 23 sections, 21 theorems, 210 equations, 1 table.

Introduction
Main Results
Related Work
Approximation rates for permutation invariant models
Intrinsic dimension and neural network approximation
Bi-Lipschitz invariant models and frame averaging
Generalization perspective
Paper Structure
Preliminaries
Approximation Rates for Permutation-Equivariant Models
Approximation Rates for the Deep Sets Permutation Invariant Model
Approximation Rates for Permutation Equivariant Functions: Sumformer
Definition of the Transformer Architecture
Approximation Rates for Non-Invariant Functions
Approximation Rates for Functions Invariant to Permutations and Rigid Motions
...and 8 more sections

Key Result

Proposition 1

[Proof in appendix] Let $d,n$ be natural numbers and let $G$ be a group acting on $\mathbb{R}^{d\times n}$, and assume that one of the following holds: Then for any $G$ invariant set $K\subseteq \mathbb{R}^{d\times n}$ for which $K/G$ is compact,

Theorems & Definitions (26)

Proposition 1
Theorem 2
Corollary 3
Corollary 4: Characterization of Permutation-Equivariant Functions
Corollary 5: Universal Approximation of Permutation Equivariant Functions
Definition 1: Attention Head (Def. 2.1 in Sumformer)
Definition 2: Attention Layer (Def. 2.2 in Sumformer)
Definition 3: Transformer Block (Def. 2.3 in Sumformer)
Definition 4: Transformer Network (Def. 2.4 in Sumformer)
Corollary 6: Transformer vaswani2017attention Approximation Rates
...and 16 more

Quantitative Approximation Rates for Group Equivariant Learning

TL;DR

Abstract

Quantitative Approximation Rates for Group Equivariant Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (26)