Table of Contents
Fetching ...

Estimation and inference for Deep Neuronal Networks

Vladimir Spokoiny

TL;DR

The paper develops a finite-sample theory for nonlinear regression and deep neural networks by introducing the effective dimension $p$ and effective sample size $N$ within the stochastically linear smooth (SLS) framework. It introduces calming to embed nonlinear regression into an augmented parameter space, enabling Fisher and Wilks expansions with explicit remainder terms and dimension-free risk bounds. The results cover both parametric and semiparametric (profile) estimation, including penalization analysis and identifiability considerations, and are specialized to nonlinear regression and a shallow DNN with one hidden layer. The work provides sharp, non-asymptotic risk bounds and a principled approach to inference and penalization in high-dimensional nonlinear models, with practical implications for training and evaluating deep networks under finite samples.

Abstract

Nonlinear regression problem is one of the most popular and important statistical tasks. The first methods like least squares estimation go back to Gauss and Legendre. Recent models and developments in statistics and machine learning like Deep Neuronal Networks (DNN) or nonlinear PDE stimulate new research in this direction which has to address the important issues and challenges of modern statistical inference such as huge complexity and parameter dimension of the model, limited sample size, lack of convexity and identifiability, among many others. Classical results of nonparametric statistics in terms of rate of convergence do not really address the mentioned issues. This paper offers a general approach to studying a nonlinear regression problem based on the notion of effective dimension. First, a special case of models with stochastically linear structure (SLS) is studied. The results provide finite sample expansions for the loss of the penalized maximum likelihood estimation (MLE). The leading term of such expansions as well as the corresponding remainder are given via the effective dimension and the effective sample size. The obtained expansions can be used to obtain sharp risk bounds and for statistical inference. Despite generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Although the basic SLS assumptions are not fulfilled for nonlinear smooth regression, we explain how the stochastic linearity can be achieved by extending the parameter space. The obtained general results are specified to nonlinear smooth regression and to a DNN with one hidden layer.

Estimation and inference for Deep Neuronal Networks

TL;DR

The paper develops a finite-sample theory for nonlinear regression and deep neural networks by introducing the effective dimension and effective sample size within the stochastically linear smooth (SLS) framework. It introduces calming to embed nonlinear regression into an augmented parameter space, enabling Fisher and Wilks expansions with explicit remainder terms and dimension-free risk bounds. The results cover both parametric and semiparametric (profile) estimation, including penalization analysis and identifiability considerations, and are specialized to nonlinear regression and a shallow DNN with one hidden layer. The work provides sharp, non-asymptotic risk bounds and a principled approach to inference and penalization in high-dimensional nonlinear models, with practical implications for training and evaluating deep networks under finite samples.

Abstract

Nonlinear regression problem is one of the most popular and important statistical tasks. The first methods like least squares estimation go back to Gauss and Legendre. Recent models and developments in statistics and machine learning like Deep Neuronal Networks (DNN) or nonlinear PDE stimulate new research in this direction which has to address the important issues and challenges of modern statistical inference such as huge complexity and parameter dimension of the model, limited sample size, lack of convexity and identifiability, among many others. Classical results of nonparametric statistics in terms of rate of convergence do not really address the mentioned issues. This paper offers a general approach to studying a nonlinear regression problem based on the notion of effective dimension. First, a special case of models with stochastically linear structure (SLS) is studied. The results provide finite sample expansions for the loss of the penalized maximum likelihood estimation (MLE). The leading term of such expansions as well as the corresponding remainder are given via the effective dimension and the effective sample size. The obtained expansions can be used to obtain sharp risk bounds and for statistical inference. Despite generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Although the basic SLS assumptions are not fulfilled for nonlinear smooth regression, we explain how the stochastic linearity can be achieved by extending the parameter space. The obtained general results are specified to nonlinear smooth regression and to a DNN with one hidden layer.
Paper Structure (54 sections, 54 theorems, 36 equations)

This paper contains 54 sections, 54 theorems, 36 equations.

Key Result

Proposition 2.1

Suppose Eref, EU2ref, and LLref. Let also $D^{2} \leq \varkappa^{2} \mathbbmsl{F}$ and $\omega^{\prime} \, \varkappa^{2} < 1/4$; see dtb3u1DG2d3GP. Then on $\varOmega(\mathtt{x})$, it holds

Theorems & Definitions (96)

  • Proposition 2.1
  • proof
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Proposition 2.6
  • Theorem 2.7
  • proof
  • Remark 2.1
  • ...and 86 more