Table of Contents
Fetching ...

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Tong Mao, Jonathan W. Siegel, Jinchao Xu

TL;DR

This work addresses the problem of approximating functions in Sobolev spaces $W^s(L_q(\Omega))$ on a bounded domain by shallow ReLU$^k$ networks with width $n$, achieving nearly optimal rates (up to logarithmic factors) in a broad regime. The authors develop a variation-space framework $\mathcal{K}_1(\mathbb{P}_k^d)$ based on ridge-spline dictionaries and prove an embedding $W^s(L_2(\Omega)) \subset \mathcal{K}_1(\mathbb{P}_k^d)$ at the critical smoothness $s=(d+2k+1)/2$, using the Radon transform and the Fourier slice theorem. By combining this embedding with recent nonlinear approximation results for variation spaces and interpolation techniques, they obtain rates of the form $\|f-f_n\|_{L_p(\Omega)} \le C \|f\|_{W^s(L_p(\Omega))} n^{-s/d}$ for $2\le p\le \infty$ and $0<s\le k+(d+1)/2$, and analogous $L_\infty$ bounds via a $W^s(L_2)$-norm; these rates are optimal up to logarithmic factors. A key insight is that adaptivity enables shallow ReLU$^k$ networks to capture Sobolev smoothness up to $s=k+(d+1)/2$, despite representing fixed-degree piecewise polynomials, suggesting practical benefits for PDE-related tasks and broadening the understanding of nonlinear approximation by ridge-spline networks.

Abstract

Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

TL;DR

This work addresses the problem of approximating functions in Sobolev spaces on a bounded domain by shallow ReLU networks with width , achieving nearly optimal rates (up to logarithmic factors) in a broad regime. The authors develop a variation-space framework based on ridge-spline dictionaries and prove an embedding at the critical smoothness , using the Radon transform and the Fourier slice theorem. By combining this embedding with recent nonlinear approximation results for variation spaces and interpolation techniques, they obtain rates of the form for and , and analogous bounds via a -norm; these rates are optimal up to logarithmic factors. A key insight is that adaptivity enables shallow ReLU networks to capture Sobolev smoothness up to , despite representing fixed-degree piecewise polynomials, suggesting practical benefits for PDE-related tasks and broadening the understanding of nonlinear approximation by ridge-spline networks.

Abstract

Let be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU activation function can approximate functions from Sobolev spaces with error measured in the -norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when , , and . The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU neural networks enables them to obtain optimal approximation rates for smoothness up to order , even though they represent piecewise polynomials of fixed degree .
Paper Structure (6 sections, 4 theorems, 81 equations, 1 table)

This paper contains 6 sections, 4 theorems, 81 equations, 1 table.

Key Result

Theorem 1

Let $s = (d+2k+1)/2$. Then we have the embedding

Theorems & Definitions (6)

  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2: Fourier Slice Theorem
  • proof : Proof of Theorem \ref{['main-embedding-theorem']}
  • proof : Proof of Corollary \ref{['main-upper-bounds-corollary']}