Table of Contents
Fetching ...

Quantitative Approximation Rates for Group Equivariant Learning

Jonathan W. Siegel, Snir Hordan, Hannah Lawrence, Ali Syed, Nadav Dym

TL;DR

Overall, it is shown that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions, and hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Abstract

The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of $α$-Hölder functions $f: [0,1]^N \to \mathbb{R}$. The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned $α$-Hölder function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Quantitative Approximation Rates for Group Equivariant Learning

TL;DR

Overall, it is shown that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions, and hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Abstract

The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of -Hölder functions . The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned -Hölder function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.
Paper Structure (23 sections, 21 theorems, 210 equations, 1 table)

This paper contains 23 sections, 21 theorems, 210 equations, 1 table.

Key Result

Proposition 1

[Proof in appendix] Let $d,n$ be natural numbers and let $G$ be a group acting on $\mathbb{R}^{d\times n}$, and assume that one of the following holds: Then for any $G$ invariant set $K\subseteq \mathbb{R}^{d\times n}$ for which $K/G$ is compact,

Theorems & Definitions (26)

  • Proposition 1
  • Theorem 2
  • Corollary 3
  • Corollary 4: Characterization of Permutation-Equivariant Functions
  • Corollary 5: Universal Approximation of Permutation Equivariant Functions
  • Definition 1: Attention Head (Def. 2.1 in Sumformer)
  • Definition 2: Attention Layer (Def. 2.2 in Sumformer)
  • Definition 3: Transformer Block (Def. 2.3 in Sumformer)
  • Definition 4: Transformer Network (Def. 2.4 in Sumformer)
  • Corollary 6: Transformer vaswani2017attention Approximation Rates
  • ...and 16 more