Statistically guided deep learning

Michael Kohler; Adam Krzyzak

Statistically guided deep learning

Michael Kohler, Adam Krzyzak

TL;DR

This work tackles nonparametric regression with deep networks by developing a theory-guided, over-parameterized architecture that combines a parallel ensemble of depth-$L$ networks with a linear readout. It introduces data-driven initializations and a principled, adaptive scheme for selecting the learning rate and number of gradient steps, yielding provable $L_2$ convergence rates that match the minimax rate for $(p,C)$-smooth regression up to an arbitrarily small $oldsymbol psilon$. The main contributions include a general error bound decomposing into approximation and estimation terms, a practical algorithm for tuning hyperparameters, and empirical evidence showing favorable finite-sample performance on simulated univariate data, often rivaling smoothing splines. The results demonstrate that theoretical analysis can guide the design of deep-learning estimators with improved finite-sample behavior, potentially extending to higher dimensions and more complex function classes.

Abstract

We present a theoretically well-founded deep learning algorithm for nonparametric regression. It uses over-parametrized deep neural networks with logistic activation function, which are fitted to the given data via gradient descent. We propose a special topology of these networks, a special random initialization of the weights, and a data-dependent choice of the learning rate and the number of gradient descent steps. We prove a theoretical bound on the expected $L_2$ error of this estimate, and illustrate its finite sample size performance by applying it to simulated data. Our results show that a theoretical analysis of deep learning which takes into account simultaneously optimization, generalization and approximation can result in a new deep learning estimate which has an improved finite sample performance.

Statistically guided deep learning

TL;DR

This work tackles nonparametric regression with deep networks by developing a theory-guided, over-parameterized architecture that combines a parallel ensemble of depth-

networks with a linear readout. It introduces data-driven initializations and a principled, adaptive scheme for selecting the learning rate and number of gradient steps, yielding provable

convergence rates that match the minimax rate for

-smooth regression up to an arbitrarily small

. The main contributions include a general error bound decomposing into approximation and estimation terms, a practical algorithm for tuning hyperparameters, and empirical evidence showing favorable finite-sample performance on simulated univariate data, often rivaling smoothing splines. The results demonstrate that theoretical analysis can guide the design of deep-learning estimators with improved finite-sample behavior, potentially extending to higher dimensions and more complex function classes.

Abstract

error of this estimate, and illustrate its finite sample size performance by applying it to simulated data. Our results show that a theoretical analysis of deep learning which takes into account simultaneously optimization, generalization and approximation can result in a new deep learning estimate which has an improved finite sample performance.

Statistically guided deep learning

TL;DR

Abstract

Statistically guided deep learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)