Table of Contents
Fetching ...

Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions

Yuwen Li, Guozhi Zhang

TL;DR

The results improve upon classical lowest order $L_\infty$ and $H^1$ norm error bounds and demonstrate that the expressivity of neural networks is largely unaffected by the curse of dimensionality.

Abstract

This paper examines the $L_p$ and $W^1_p$ norm approximation errors of ReLU neural networks for Korobov functions. In terms of network width and depth, we derive nearly optimal super-approximation error bounds of order $2m$ in the $L_p$ norm and order $2m-2$ in the $W^1_p$ norm, for target functions with $L_p$ mixed derivative of order $m$ in each direction. The analysis leverages sparse grid finite elements and the bit extraction technique. Our results improve upon classical lowest order $L_\infty$ and $H^1$ norm error bounds and demonstrate that the expressivity of neural networks is largely unaffected by the curse of dimensionality.

Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions

TL;DR

The results improve upon classical lowest order and norm error bounds and demonstrate that the expressivity of neural networks is largely unaffected by the curse of dimensionality.

Abstract

This paper examines the and norm approximation errors of ReLU neural networks for Korobov functions. In terms of network width and depth, we derive nearly optimal super-approximation error bounds of order in the norm and order in the norm, for target functions with mixed derivative of order in each direction. The analysis leverages sparse grid finite elements and the bit extraction technique. Our results improve upon classical lowest order and norm error bounds and demonstrate that the expressivity of neural networks is largely unaffected by the curse of dimensionality.

Paper Structure

This paper contains 17 sections, 12 theorems, 165 equations, 2 figures.

Key Result

theorem 1

Given any $f \in X^m_p(\Omega)$ with $1\leq p<\infty$ and $m\geq 2$, for any $W,L \in \mathbb{N}_{+}$, there is a function $\phi$ implemented by a ReLU DNN with width $W$ and depth $L$ such that where $\widetilde{C}_1=(C_1C_2)^4C_3$, $C_1=112d(2d)^d$, $C_2=320d^2$, and $C_3 = (2d)^{3d}$ if $m=2$; $\widetilde{C}_1$ is a constant depending only on $m$ and $d$ if $m\geq3$.

Figures (2)

  • Figure 1: An illustration of the network architecture implementing $\phi_{\bm l}$, where the input $\bm x$ belongs to a region $\Omega_{\bm i,\varepsilon}^{\bm l}$. Red functions represent the transformation from the left input vector to the right output vector. The $d$-dimensional identity mapping can be represented as a ReLU network with a width of $2d$ and a depth of $1$. Finally, the product network $\phi_4$ of $d+1$ variables implies the desired network function.
  • Figure 2: An illustration of $g_1,g_2,\Omega_1,$ and $\Omega_2$, where $K=2$.

Theorems & Definitions (23)

  • definition 1: Korobov space
  • theorem 1
  • theorem 2
  • remark 1
  • lemma 1: Proposition 4.3 from lu_shen_yang_zhang2021deep
  • lemma 2
  • lemma 3: Proposition 4.4 from lu_shen_yang_zhang2021deep
  • lemma 4
  • proof
  • lemma 5: Proposition 4 from YangYangXiang2023
  • ...and 13 more