Table of Contents
Fetching ...

Nonparametric regression using over-parameterized shallow ReLU neural networks

Yunfei Yang, Ding-Xuan Zhou

TL;DR

This work analyzes nonparametric regression using over-parameterized shallow ReLU networks under weight constraints, proving minimax optimal learning rates for Hölder smoothness 𝓗^α (α<(d+3)/2) and the variation space 𝓕_σ(1). By constraining the network via κ(θ)=∑|a_i|‖w_i‖_2 ≤ M and employing a localized complexity framework, the authors derive rate guarantees for constrained and regularized least squares estimators, with a novel, width-independent local Rademacher complexity bound 𝓡_n( NN(N,M); δ) ≲ δ^{3/(d+3)} M^{d/(d+3)} / √n √{log(nM/δ)}. They show that, with appropriate width N_n and weight bound M_n, the constrained LS achieves rates n^{−2α/(d+2α)} (up to log factors) for 𝓗^α and n^{−(d+3)/(2d+3)} for 𝓕_σ(1); analogous rates hold for regularized LS with λ_n chosen to balance approximation and complexity. The proofs hinge on a decomposition into approximation error and local complexity, plus new oracle inequalities for both constrained and regularized settings. Practically, the results justify using over-parameterized shallow networks with norm-based regularization, and provide sharp complexity tools that may extend to broader neural-network regimes.

Abstract

It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the Hölder space with smoothness $α<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.

Nonparametric regression using over-parameterized shallow ReLU neural networks

TL;DR

This work analyzes nonparametric regression using over-parameterized shallow ReLU networks under weight constraints, proving minimax optimal learning rates for Hölder smoothness 𝓗^α (α<(d+3)/2) and the variation space 𝓕_σ(1). By constraining the network via κ(θ)=∑|a_i|‖w_i‖_2 ≤ M and employing a localized complexity framework, the authors derive rate guarantees for constrained and regularized least squares estimators, with a novel, width-independent local Rademacher complexity bound 𝓡_n( NN(N,M); δ) ≲ δ^{3/(d+3)} M^{d/(d+3)} / √n √{log(nM/δ)}. They show that, with appropriate width N_n and weight bound M_n, the constrained LS achieves rates n^{−2α/(d+2α)} (up to log factors) for 𝓗^α and n^{−(d+3)/(2d+3)} for 𝓕_σ(1); analogous rates hold for regularized LS with λ_n chosen to balance approximation and complexity. The proofs hinge on a decomposition into approximation error and local complexity, plus new oracle inequalities for both constrained and regularized settings. Practically, the results justify using over-parameterized shallow networks with norm-based regularization, and provide sharp complexity tools that may extend to broader neural-network regimes.

Abstract

It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown -variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the Hölder space with smoothness or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
Paper Structure (11 sections, 11 theorems, 155 equations)

This paper contains 11 sections, 11 theorems, 155 equations.

Key Result

Theorem 1

For any $f\in \mathcal{NN}(N)$,

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2: yang2024optimal
  • Theorem 3
  • Definition 4: Local complexity
  • Theorem 5
  • Lemma 6
  • Lemma 7
  • Remark 8
  • Theorem 9
  • Lemma 10
  • ...and 3 more