Table of Contents
Fetching ...

Approximation and interpolation of deep neural networks

Vlad-Raul Constantinescu, Ionel Popescu

TL;DR

This work proves that overparameterized neural networks with non-affine activations universally interpolate any dataset of size $d$, and, when activations are smooth, that the interpolating parameters form a nonempty $n-d$ dimensional manifold in parameter space. It extends universal approximation/density results to deep architectures and analyzes the Hessian structure at interpolation points, showing a positive-definite subspace of size $d$ and a flat $n-d$ subspace. The authors provide a practical probabilistic method to find interpolation points by randomizing input-to-hidden weights and solving a linear regression in the output layer, with high-probability guarantees under mild conditions and a bound on hidden-layer width $h$. They also treat polynomial activations, offering conditions under which interpolation remains possible and describing how depth can compensate for polynomial complexity. Overall, the paper builds a cohesive theory linking interpolation, density, optimization, and the geometry of global minima across broad activation families.

Abstract

In this paper, we prove that in the overparametrized regime, deep neural network provide universal approximations and can interpolate any data set, as long as the activation function is locally in $L^1(\RR)$ and not an affine function. Additionally, if the activation function is smooth and such an interpolation networks exists, then the set of parameters which interpolate forms a manifold. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the interpolation points. In the last section, we provide a practical probabilistic method of finding such a point under general conditions on the activation function.

Approximation and interpolation of deep neural networks

TL;DR

This work proves that overparameterized neural networks with non-affine activations universally interpolate any dataset of size , and, when activations are smooth, that the interpolating parameters form a nonempty dimensional manifold in parameter space. It extends universal approximation/density results to deep architectures and analyzes the Hessian structure at interpolation points, showing a positive-definite subspace of size and a flat subspace. The authors provide a practical probabilistic method to find interpolation points by randomizing input-to-hidden weights and solving a linear regression in the output layer, with high-probability guarantees under mild conditions and a bound on hidden-layer width . They also treat polynomial activations, offering conditions under which interpolation remains possible and describing how depth can compensate for polynomial complexity. Overall, the paper builds a cohesive theory linking interpolation, density, optimization, and the geometry of global minima across broad activation families.

Abstract

In this paper, we prove that in the overparametrized regime, deep neural network provide universal approximations and can interpolate any data set, as long as the activation function is locally in and not an affine function. Additionally, if the activation function is smooth and such an interpolation networks exists, then the set of parameters which interpolate forms a manifold. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the interpolation points. In the last section, we provide a practical probabilistic method of finding such a point under general conditions on the activation function.
Paper Structure (13 sections, 14 theorems, 51 equations, 1 figure)

This paper contains 13 sections, 14 theorems, 51 equations, 1 figure.

Key Result

Theorem 2.1

In the framework above, the set $M=L^{-1}(0)$ is generically (that is, possibly after an arbitrarily small change to the data set) a smooth $n-d$ dimensional submanifold (possibly empty) of $\mathbb{R}^n$.

Figures (1)

  • Figure 1: Example of such a neural network arhitecture

Theorems & Definitions (25)

  • Theorem 2.1
  • Theorem 2.3
  • proof
  • Theorem 2.5
  • proof
  • Corollary 2.6
  • proof
  • Proposition 2.7
  • proof
  • Corollary 2.9
  • ...and 15 more