Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Tong Mao; Jonathan W. Siegel; Jinchao Xu

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Tong Mao, Jonathan W. Siegel, Jinchao Xu

TL;DR

This work addresses the problem of approximating functions in Sobolev spaces $W^s(L_q(\Omega))$ on a bounded domain by shallow ReLU$^k$ networks with width $n$, achieving nearly optimal rates (up to logarithmic factors) in a broad regime. The authors develop a variation-space framework $\mathcal{K}_1(\mathbb{P}_k^d)$ based on ridge-spline dictionaries and prove an embedding $W^s(L_2(\Omega)) \subset \mathcal{K}_1(\mathbb{P}_k^d)$ at the critical smoothness $s=(d+2k+1)/2$, using the Radon transform and the Fourier slice theorem. By combining this embedding with recent nonlinear approximation results for variation spaces and interpolation techniques, they obtain rates of the form $\|f-f_n\|_{L_p(\Omega)} \le C \|f\|_{W^s(L_p(\Omega))} n^{-s/d}$ for $2\le p\le \infty$ and $0<s\le k+(d+1)/2$, and analogous $L_\infty$ bounds via a $W^s(L_2)$-norm; these rates are optimal up to logarithmic factors. A key insight is that adaptivity enables shallow ReLU$^k$ networks to capture Sobolev smoothness up to $s=k+(d+1)/2$, despite representing fixed-degree piecewise polynomials, suggesting practical benefits for PDE-related tasks and broadening the understanding of nonlinear approximation by ridge-spline networks.

Abstract

Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

TL;DR

This work addresses the problem of approximating functions in Sobolev spaces

on a bounded domain by shallow ReLU

networks with width

, achieving nearly optimal rates (up to logarithmic factors) in a broad regime. The authors develop a variation-space framework

based on ridge-spline dictionaries and prove an embedding

at the critical smoothness

, using the Radon transform and the Fourier slice theorem. By combining this embedding with recent nonlinear approximation results for variation spaces and interpolation techniques, they obtain rates of the form

for

and

, and analogous

bounds via a

-norm; these rates are optimal up to logarithmic factors. A key insight is that adaptivity enables shallow ReLU

networks to capture Sobolev smoothness up to

, despite representing fixed-degree piecewise polynomials, suggesting practical benefits for PDE-related tasks and broadening the understanding of nonlinear approximation by ridge-spline networks.

Abstract

Let

be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU

activation function can approximate functions from Sobolev spaces

with error measured in the

-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when

, and

. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU

neural networks enables them to obtain optimal approximation rates for smoothness up to order

, even though they represent piecewise polynomials of fixed degree

Paper Structure (6 sections, 4 theorems, 81 equations, 1 table)

This paper contains 6 sections, 4 theorems, 81 equations, 1 table.

Introduction
The Radon Transform
Embeddings of Sobolev Spaces into ReLU$^k$ Variation Spaces
Approximation Upper Bounds for Sobolev Spaces
Conclusion
Acknowledgements

Key Result

Theorem 1

Let $s = (d+2k+1)/2$. Then we have the embedding

Theorems & Definitions (6)

Theorem 1
Corollary 1
Corollary 2
Theorem 2: Fourier Slice Theorem
proof : Proof of Theorem \ref{['main-embedding-theorem']}
proof : Proof of Corollary \ref{['main-upper-bounds-corollary']}

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

TL;DR

Abstract

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (6)