Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

Liam Madden

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

Liam Madden

TL;DR

It is proved that $\Theta(\sqrt{nd'})$ neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there.

Abstract

The minimal number of neurons required for a feedforward neural network to interpolate $n$ generic input-output pairs from $\mathbb{R}^d\times \mathbb{R}^{d'}$ is $Θ(\sqrt{nd'})$. While previous results have shown that $Θ(\sqrt{nd'})$ neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we prove that $Θ(\sqrt{nd'})$ neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there. Thus, the only practical activation functions that our result does not apply to are piecewise polynomials. Importantly, this means that activation functions can be freely chosen in a problem-dependent manner without loss of interpolation power.

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

TL;DR

It is proved that

neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there.

Abstract

The minimal number of neurons required for a feedforward neural network to interpolate

generic input-output pairs from

. While previous results have shown that

neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we prove that

neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there. Thus, the only practical activation functions that our result does not apply to are piecewise polynomials. Importantly, this means that activation functions can be freely chosen in a problem-dependent manner without loss of interpolation power.

Paper Structure (10 sections, 11 theorems, 47 equations)

This paper contains 10 sections, 11 theorems, 47 equations.

Introduction
Results
Related work
Organization
Preliminaries
The FNN model
Three layers
Four or more layers
Necessary and sufficient number of neurons
Conclusion

Key Result

Theorem 3.1

Let $n,d,d',L\in\mathbb{N}$ with $L\ge 2$. Then an $(L+1)$-layer FNN with continuously differentiable activations and less than neurons cannot interpolate $n$ generic points in $\mathbb{R}^d\times\mathbb{R}^{d'}$.

Theorems & Definitions (21)

Theorem 3.1
proof
Lemma 3.2: Thm. 5.2 of madden2024memory
Theorem 4.1
proof
Theorem 4.2
proof
Theorem 4.3
proof
Theorem 4.4
...and 11 more

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

TL;DR

Abstract

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (21)