Weighted variation spaces and approximation by shallow ReLU networks

Ronald DeVore; Robert D. Nowak; Rahul Parhi; Jonathan W. Siegel

Weighted variation spaces and approximation by shallow ReLU networks

Ronald DeVore, Robert D. Nowak, Rahul Parhi, Jonathan W. Siegel

TL;DR

A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces, which are intrinsic to the domain itself and strictly larger than the classical (domain-independent) classes.

Abstract

We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $Ω$ whose approximation rates do not grow unbounded with the input dimension. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes. The present paper is concerned with the definition of these novel model classes on domains $Ω$. The current definition of these model classes does not depend on the domain $Ω$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.

Weighted variation spaces and approximation by shallow ReLU networks

TL;DR

Abstract

We investigate the approximation of functions

on a bounded domain

by the outputs of single-hidden-layer ReLU neural networks of width

. This form of nonlinear

-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on

whose approximation rates do not grow unbounded with the input dimension. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes. The present paper is concerned with the definition of these novel model classes on domains

. The current definition of these model classes does not depend on the domain

. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.

Paper Structure (13 sections, 11 theorems, 163 equations)

This paper contains 13 sections, 11 theorems, 163 equations.

Introduction
Approximation by shallow ReLU networks
Novel (non-classical) model classes
Weighted variation model classes
Approximation in $\Omega=B^2$
The approximation theorem
Weighted variation spaces for $\Omega=Q^2$
Approximation in $L_2(B^d)$
The proof of Theorem \ref{['T:approxphi']}
Proof of Theorem \ref{['T:B^d']}
Concluding Remarks
Open Problems
Appendix

Key Result

Lemma 5.1

Suppose that $m\ge 4$ is an even integer, $n=m(m-1)$, and $\phi=\sigma(\cdot;\xi,t)$ is any dictionary element whose line segment $L_\phi$ is in ${\cal L}_{i,j}={\cal L}_{i,j}(m)$ with $\mu_i\neq \mu_j$. Then there is a function $g\in X_n$ such that (i) $\phi(x)=g(x),\ x\notin S_{i,j}$, (ii) $\|\phi

Theorems & Definitions (15)

Lemma 5.1
Theorem 5.2
Remark 5.3
Remark 5.4
Theorem 5.5
Theorem 6.1
Theorem 6.2
Remark 6.3
Lemma 6.4
Lemma 6.5
...and 5 more

Weighted variation spaces and approximation by shallow ReLU networks

TL;DR

Abstract

Weighted variation spaces and approximation by shallow ReLU networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (15)