Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation
Hao Liu, Jiahui Cheng, Wenjing Liao
TL;DR
This paper establishes that deep ReLU networks can automatically adapt to varying function regularity and nonuniform data distributions by leveraging a nonlinear tree-based approximation framework. Regularity is captured via the adaptive, multiscale partitioning of the domain and the function class $\mathcal{A}^s_{\theta}$, which generalizes Hölder and piecewise Hölder functions and includes irregularities on measure-zero sets or data concentrated on low-dimensional manifolds. The authors prove universal approximation results and derive generalization error bounds for learning functions in $\mathcal{A}^s_{\theta}$, with rates that reflect intrinsic geometric and regularity properties rather than ambient dimension. Numerical experiments in one dimension corroborate the theory, illustrating how network architectures can adapt to changes in regularity and data distribution. Overall, the work provides a rigorous foundation for the observed adaptivity of deep networks to local smoothness and data geometry, informing future design of architecture and training strategies for nonuniform regression tasks.
Abstract
Deep learning has exhibited remarkable results across diverse areas. To understand its success, substantial research has been directed towards its theoretical foundations. Nevertheless, the majority of these studies examine how well deep neural networks can model functions with uniform regularity. In this paper, we explore a different angle: how deep neural networks can adapt to different regularity in functions across different locations and scales and nonuniform data distributions. More precisely, we focus on a broad class of functions defined by nonlinear tree-based approximation. This class encompasses a range of function types, such as functions with uniform regularity and discontinuous functions. We develop nonparametric approximation and estimation theories for this function class using deep ReLU networks. Our results show that deep neural networks are adaptive to different regularity of functions and nonuniform data distributions at different locations and scales. We apply our results to several function classes, and derive the corresponding approximation and generalization errors. The validity of our results is demonstrated through numerical experiments.
