Table of Contents
Fetching ...

Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

Hao Liu, Jiahui Cheng, Wenjing Liao

TL;DR

This paper establishes that deep ReLU networks can automatically adapt to varying function regularity and nonuniform data distributions by leveraging a nonlinear tree-based approximation framework. Regularity is captured via the adaptive, multiscale partitioning of the domain and the function class $\mathcal{A}^s_{\theta}$, which generalizes Hölder and piecewise Hölder functions and includes irregularities on measure-zero sets or data concentrated on low-dimensional manifolds. The authors prove universal approximation results and derive generalization error bounds for learning functions in $\mathcal{A}^s_{\theta}$, with rates that reflect intrinsic geometric and regularity properties rather than ambient dimension. Numerical experiments in one dimension corroborate the theory, illustrating how network architectures can adapt to changes in regularity and data distribution. Overall, the work provides a rigorous foundation for the observed adaptivity of deep networks to local smoothness and data geometry, informing future design of architecture and training strategies for nonuniform regression tasks.

Abstract

Deep learning has exhibited remarkable results across diverse areas. To understand its success, substantial research has been directed towards its theoretical foundations. Nevertheless, the majority of these studies examine how well deep neural networks can model functions with uniform regularity. In this paper, we explore a different angle: how deep neural networks can adapt to different regularity in functions across different locations and scales and nonuniform data distributions. More precisely, we focus on a broad class of functions defined by nonlinear tree-based approximation. This class encompasses a range of function types, such as functions with uniform regularity and discontinuous functions. We develop nonparametric approximation and estimation theories for this function class using deep ReLU networks. Our results show that deep neural networks are adaptive to different regularity of functions and nonuniform data distributions at different locations and scales. We apply our results to several function classes, and derive the corresponding approximation and generalization errors. The validity of our results is demonstrated through numerical experiments.

Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

TL;DR

This paper establishes that deep ReLU networks can automatically adapt to varying function regularity and nonuniform data distributions by leveraging a nonlinear tree-based approximation framework. Regularity is captured via the adaptive, multiscale partitioning of the domain and the function class , which generalizes Hölder and piecewise Hölder functions and includes irregularities on measure-zero sets or data concentrated on low-dimensional manifolds. The authors prove universal approximation results and derive generalization error bounds for learning functions in , with rates that reflect intrinsic geometric and regularity properties rather than ambient dimension. Numerical experiments in one dimension corroborate the theory, illustrating how network architectures can adapt to changes in regularity and data distribution. Overall, the work provides a rigorous foundation for the observed adaptivity of deep networks to local smoothness and data geometry, informing future design of architecture and training strategies for nonuniform regression tasks.

Abstract

Deep learning has exhibited remarkable results across diverse areas. To understand its success, substantial research has been directed towards its theoretical foundations. Nevertheless, the majority of these studies examine how well deep neural networks can model functions with uniform regularity. In this paper, we explore a different angle: how deep neural networks can adapt to different regularity in functions across different locations and scales and nonuniform data distributions. More precisely, we focus on a broad class of functions defined by nonlinear tree-based approximation. This class encompasses a range of function types, such as functions with uniform regularity and discontinuous functions. We develop nonparametric approximation and estimation theories for this function class using deep ReLU networks. Our results show that deep neural networks are adaptive to different regularity of functions and nonuniform data distributions at different locations and scales. We apply our results to several function classes, and derive the corresponding approximation and generalization errors. The validity of our results is demonstrated through numerical experiments.
Paper Structure (48 sections, 9 theorems, 166 equations, 7 figures)

This paper contains 48 sections, 9 theorems, 166 equations, 7 figures.

Key Result

Lemma 1

Let $R>0$, $\theta$ be a fixed nonnegative integer, and $\rho$ be the Lebesgue measure on $X=[0,1]^d$. There exists a constant $R_p>0$ depending on $\theta,d$ and $R$ such that, for any function $f$ on $[0,1]^d$ satisfying $\|f\|_{L^{\infty}(X)}\leq R$, the $p_{j,k}$ in (eqpjk) has the form of (eq.p

Figures (7)

  • Figure 1: The dyadic partition of the 2D unit cube $[0,1]^2$ and the associated tree.
  • Figure 2: (a) For a fixed $\eta>0$, the red nodes have the refinement quantity above $\eta$: $\delta_{j,k}(f) >\eta$. The master tree is then truncated to the smallest subtree containing the red nodes in (b). In (c), the outer leaves of the truncated tree are given by the green nodes. The corresponding adaptive partition is given in (d).
  • Figure 3: (a) Example \ref{['examplepieceholderinasl1d']}: 1D piecewise Hölder function with $K$ discontinuity points; (b) Example \ref{['examplepieceholderinasl']}: A 2D piecewise domain. The functions in Example \ref{['examplepieceholderinasl']} are $r$-Hölder in the interior of $\Omega_1,\Omega_2,\Omega_3$.
  • Figure 4: Functions with different numbers of discontinuity points.
  • Figure 5: Trained model with $n_{\rm train} = 16$ (1st column), $64$ (2nd column), $256$ (3rd column) when $\sigma = 0$ (1st row), $0.1$ (2nd row), $0.3$ (3rd row), and $0.5$ (4th row).
  • ...and 2 more figures

Theorems & Definitions (25)

  • Definition 1: Hölder functions
  • Definition 2: Minkowski dimension
  • Definition 3: (2.19) in binev2007universal
  • Example 1a: Hölder functions
  • Example 2a: Piecewise Hölder functions in 1D
  • Example 3a: Piecewise Hölder functions in multi-dimensions
  • Example 4a: Functions irregular on a set of measure zero
  • Example 5a: Hölder functions with distribution concentrated on a low-dimensional manifold
  • Lemma 1
  • Theorem 1: Approximation
  • ...and 15 more