Estimation and inference for Deep Neuronal Networks

Vladimir Spokoiny

Estimation and inference for Deep Neuronal Networks

Vladimir Spokoiny

TL;DR

The paper develops a finite-sample theory for nonlinear regression and deep neural networks by introducing the effective dimension $p$ and effective sample size $N$ within the stochastically linear smooth (SLS) framework. It introduces calming to embed nonlinear regression into an augmented parameter space, enabling Fisher and Wilks expansions with explicit remainder terms and dimension-free risk bounds. The results cover both parametric and semiparametric (profile) estimation, including penalization analysis and identifiability considerations, and are specialized to nonlinear regression and a shallow DNN with one hidden layer. The work provides sharp, non-asymptotic risk bounds and a principled approach to inference and penalization in high-dimensional nonlinear models, with practical implications for training and evaluating deep networks under finite samples.

Abstract

Nonlinear regression problem is one of the most popular and important statistical tasks. The first methods like least squares estimation go back to Gauss and Legendre. Recent models and developments in statistics and machine learning like Deep Neuronal Networks (DNN) or nonlinear PDE stimulate new research in this direction which has to address the important issues and challenges of modern statistical inference such as huge complexity and parameter dimension of the model, limited sample size, lack of convexity and identifiability, among many others. Classical results of nonparametric statistics in terms of rate of convergence do not really address the mentioned issues. This paper offers a general approach to studying a nonlinear regression problem based on the notion of effective dimension. First, a special case of models with stochastically linear structure (SLS) is studied. The results provide finite sample expansions for the loss of the penalized maximum likelihood estimation (MLE). The leading term of such expansions as well as the corresponding remainder are given via the effective dimension and the effective sample size. The obtained expansions can be used to obtain sharp risk bounds and for statistical inference. Despite generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Although the basic SLS assumptions are not fulfilled for nonlinear smooth regression, we explain how the stochastic linearity can be achieved by extending the parameter space. The obtained general results are specified to nonlinear smooth regression and to a DNN with one hidden layer.

Estimation and inference for Deep Neuronal Networks

TL;DR

The paper develops a finite-sample theory for nonlinear regression and deep neural networks by introducing the effective dimension

and effective sample size

within the stochastically linear smooth (SLS) framework. It introduces calming to embed nonlinear regression into an augmented parameter space, enabling Fisher and Wilks expansions with explicit remainder terms and dimension-free risk bounds. The results cover both parametric and semiparametric (profile) estimation, including penalization analysis and identifiability considerations, and are specialized to nonlinear regression and a shallow DNN with one hidden layer. The work provides sharp, non-asymptotic risk bounds and a principled approach to inference and penalization in high-dimensional nonlinear models, with practical implications for training and evaluating deep networks under finite samples.

Abstract

Paper Structure (54 sections, 54 theorems, 36 equations)

This paper contains 54 sections, 54 theorems, 36 equations.

Introduction
Properties of the MLE $\widetilde{\boldsymbol{\upsilon}}$ for SLS models
Basic conditions
Concentration of the MLE $\widetilde{\boldsymbol{\upsilon}}$. 2S-expansions
Expansions and risk bounds under third-order smoothness
Effective and critical dimension in ML estimation
Penalization bias
Loss and risk of the pMLE. Bias-variance decomposition
Profile semiparametric estimation for SLS models
Full dimensional estimation
Expansions and risk bounds for the profile MLE
Fisher and Wilks expansions
Identifiability and semiparametric effective/critical dimension
Penalization in profile MLE
Separable penalty
...and 39 more sections

Key Result

Proposition 2.1

Suppose Eref, EU2ref, and LLref. Let also $D^{2} \leq \varkappa^{2} \mathbbmsl{F}$ and $\omega^{\prime} \, \varkappa^{2} < 1/4$; see dtb3u1DG2d3GP. Then on $\varOmega(\mathtt{x})$, it holds

Theorems & Definitions (96)

Proposition 2.1
proof
Theorem 2.2
Theorem 2.3
Theorem 2.4
Theorem 2.5
Proposition 2.6
Theorem 2.7
proof
Remark 2.1
...and 86 more

Estimation and inference for Deep Neuronal Networks

TL;DR

Abstract

Estimation and inference for Deep Neuronal Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (96)