Approximation and interpolation of deep neural networks

Vlad-Raul Constantinescu; Ionel Popescu

Approximation and interpolation of deep neural networks

Vlad-Raul Constantinescu, Ionel Popescu

TL;DR

This work proves that overparameterized neural networks with non-affine activations universally interpolate any dataset of size $d$, and, when activations are smooth, that the interpolating parameters form a nonempty $n-d$ dimensional manifold in parameter space. It extends universal approximation/density results to deep architectures and analyzes the Hessian structure at interpolation points, showing a positive-definite subspace of size $d$ and a flat $n-d$ subspace. The authors provide a practical probabilistic method to find interpolation points by randomizing input-to-hidden weights and solving a linear regression in the output layer, with high-probability guarantees under mild conditions and a bound on hidden-layer width $h$. They also treat polynomial activations, offering conditions under which interpolation remains possible and describing how depth can compensate for polynomial complexity. Overall, the paper builds a cohesive theory linking interpolation, density, optimization, and the geometry of global minima across broad activation families.

Abstract

In this paper, we prove that in the overparametrized regime, deep neural network provide universal approximations and can interpolate any data set, as long as the activation function is locally in $L^1(\RR)$ and not an affine function. Additionally, if the activation function is smooth and such an interpolation networks exists, then the set of parameters which interpolate forms a manifold. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the interpolation points. In the last section, we provide a practical probabilistic method of finding such a point under general conditions on the activation function.

Approximation and interpolation of deep neural networks

TL;DR

This work proves that overparameterized neural networks with non-affine activations universally interpolate any dataset of size

, and, when activations are smooth, that the interpolating parameters form a nonempty

dimensional manifold in parameter space. It extends universal approximation/density results to deep architectures and analyzes the Hessian structure at interpolation points, showing a positive-definite subspace of size

and a flat

subspace. The authors provide a practical probabilistic method to find interpolation points by randomizing input-to-hidden weights and solving a linear regression in the output layer, with high-probability guarantees under mild conditions and a bound on hidden-layer width

. They also treat polynomial activations, offering conditions under which interpolation remains possible and describing how depth can compensate for polynomial complexity. Overall, the paper builds a cohesive theory linking interpolation, density, optimization, and the geometry of global minima across broad activation families.

Abstract

In this paper, we prove that in the overparametrized regime, deep neural network provide universal approximations and can interpolate any data set, as long as the activation function is locally in

and not an affine function. Additionally, if the activation function is smooth and such an interpolation networks exists, then the set of parameters which interpolate forms a manifold. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the interpolation points. In the last section, we provide a practical probabilistic method of finding such a point under general conditions on the activation function.

Paper Structure (13 sections, 14 theorems, 51 equations, 1 figure)

This paper contains 13 sections, 14 theorems, 51 equations, 1 figure.

Introduction
Interpolation of deep neural networks
Universal Approximation and Network Density
Numerical Methods and Gradient Descent
Interpolation of deep neural networks
The general case of activation functions
The non-polynomial case and shallow networks
The general non-affine activation functions and deep neural networks
Extensions of interpolation for polynomial activation function
Density of deep neural networks
The Hessian for the global minima
Convergence to the global minima
Extensions and Comments

Key Result

Theorem 2.1

In the framework above, the set $M=L^{-1}(0)$ is generically (that is, possibly after an arbitrarily small change to the data set) a smooth $n-d$ dimensional submanifold (possibly empty) of $\mathbb{R}^n$.

Figures (1)

Figure 1: Example of such a neural network arhitecture

Theorems & Definitions (25)

Theorem 2.1
Theorem 2.3
proof
Theorem 2.5
proof
Corollary 2.6
proof
Proposition 2.7
proof
Corollary 2.9
...and 15 more

Approximation and interpolation of deep neural networks

TL;DR

Abstract

Approximation and interpolation of deep neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (25)