Table of Contents
Fetching ...

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

Liam Madden

TL;DR

It is proved that $\Theta(\sqrt{nd'})$ neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there.

Abstract

The minimal number of neurons required for a feedforward neural network to interpolate $n$ generic input-output pairs from $\mathbb{R}^d\times \mathbb{R}^{d'}$ is $Θ(\sqrt{nd'})$. While previous results have shown that $Θ(\sqrt{nd'})$ neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we prove that $Θ(\sqrt{nd'})$ neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there. Thus, the only practical activation functions that our result does not apply to are piecewise polynomials. Importantly, this means that activation functions can be freely chosen in a problem-dependent manner without loss of interpolation power.

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

TL;DR

It is proved that neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there.

Abstract

The minimal number of neurons required for a feedforward neural network to interpolate generic input-output pairs from is . While previous results have shown that neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we prove that neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there. Thus, the only practical activation functions that our result does not apply to are piecewise polynomials. Importantly, this means that activation functions can be freely chosen in a problem-dependent manner without loss of interpolation power.
Paper Structure (10 sections, 11 theorems, 47 equations)

This paper contains 10 sections, 11 theorems, 47 equations.

Key Result

Theorem 3.1

Let $n,d,d',L\in\mathbb{N}$ with $L\ge 2$. Then an $(L+1)$-layer FNN with continuously differentiable activations and less than neurons cannot interpolate $n$ generic points in $\mathbb{R}^d\times\mathbb{R}^{d'}$.

Theorems & Definitions (21)

  • Theorem 3.1
  • proof
  • Lemma 3.2: Thm. 5.2 of madden2024memory
  • Theorem 4.1
  • proof
  • Theorem 4.2
  • proof
  • Theorem 4.3
  • proof
  • Theorem 4.4
  • ...and 11 more