Table of Contents
Fetching ...

On the accuracy of interpolation based on single-layer artificial neural networks with a focus on defeating the Runge phenomenon

Ferdinando Auricchio, Maria Roberta Belardo, Gianluca Fabiani, Francesco Calabrò, Ariel F. Pascaner

TL;DR

The paper investigates interpolation accuracy of shallow, single-hidden-layer ANNs trained via Extreme Learning Machine (ELM) and demonstrates that, with overparameterization, these networks can interpolate diverse functions without succumbing to Runge-type oscillations, even on equispaced or random nodes. By fixing internal random weights and solving a linear least-squares problem for the output weights, the resulting network function tilde{u} achieves convergence rates comparable to Chebyshev-based polynomial interpolation, and even accurately captures derivatives. Across Runge’s function and various analytic/differentiable benchmarks, the interpolation error decays as the number of degrees of freedom grows, largely independent of node placement. The findings suggest that overparameterized, ELM-trained shallow networks provide a stable, efficient alternative to classical global polynomials for univariate interpolation and offer insights for broader scientific machine learning applications.

Abstract

In the present paper, we consider one-hidden layer ANNs with a feedforward architecture, also referred to as shallow or two-layer networks, so that the structure is determined by the number and types of neurons. The determination of the parameters that define the function, called training, is done via the resolution of the approximation problem, so by imposing the interpolation through a set of specific nodes. We present the case where the parameters are trained using a procedure that is referred to as Extreme Learning Machine (ELM) that leads to a linear interpolation problem. In such hypotheses, the existence of an ANN interpolating function is guaranteed. The focus is then on the accuracy of the interpolation outside of the given sampling interpolation nodes when they are the equispaced, the Chebychev, and the randomly selected ones. The study is motivated by the well-known bell-shaped Runge example, which makes it clear that the construction of a global interpolating polynomial is accurate only if trained on suitably chosen nodes, ad example the Chebychev ones. In order to evaluate the behavior when growing the number of interpolation nodes, we raise the number of neurons in our network and compare it with the interpolating polynomial. We test using Runge's function and other well-known examples with different regularities. As expected, the accuracy of the approximation with a global polynomial increases only if the Chebychev nodes are considered. Instead, the error for the ANN interpolating function always decays and in most cases we observe that the convergence follows what is observed in the polynomial case on Chebychev nodes, despite the set of nodes used for training.

On the accuracy of interpolation based on single-layer artificial neural networks with a focus on defeating the Runge phenomenon

TL;DR

The paper investigates interpolation accuracy of shallow, single-hidden-layer ANNs trained via Extreme Learning Machine (ELM) and demonstrates that, with overparameterization, these networks can interpolate diverse functions without succumbing to Runge-type oscillations, even on equispaced or random nodes. By fixing internal random weights and solving a linear least-squares problem for the output weights, the resulting network function tilde{u} achieves convergence rates comparable to Chebyshev-based polynomial interpolation, and even accurately captures derivatives. Across Runge’s function and various analytic/differentiable benchmarks, the interpolation error decays as the number of degrees of freedom grows, largely independent of node placement. The findings suggest that overparameterized, ELM-trained shallow networks provide a stable, efficient alternative to classical global polynomials for univariate interpolation and offer insights for broader scientific machine learning applications.

Abstract

In the present paper, we consider one-hidden layer ANNs with a feedforward architecture, also referred to as shallow or two-layer networks, so that the structure is determined by the number and types of neurons. The determination of the parameters that define the function, called training, is done via the resolution of the approximation problem, so by imposing the interpolation through a set of specific nodes. We present the case where the parameters are trained using a procedure that is referred to as Extreme Learning Machine (ELM) that leads to a linear interpolation problem. In such hypotheses, the existence of an ANN interpolating function is guaranteed. The focus is then on the accuracy of the interpolation outside of the given sampling interpolation nodes when they are the equispaced, the Chebychev, and the randomly selected ones. The study is motivated by the well-known bell-shaped Runge example, which makes it clear that the construction of a global interpolating polynomial is accurate only if trained on suitably chosen nodes, ad example the Chebychev ones. In order to evaluate the behavior when growing the number of interpolation nodes, we raise the number of neurons in our network and compare it with the interpolating polynomial. We test using Runge's function and other well-known examples with different regularities. As expected, the accuracy of the approximation with a global polynomial increases only if the Chebychev nodes are considered. Instead, the error for the ANN interpolating function always decays and in most cases we observe that the convergence follows what is observed in the polynomial case on Chebychev nodes, despite the set of nodes used for training.
Paper Structure (21 sections, 2 theorems, 20 equations, 12 figures, 6 tables)

This paper contains 21 sections, 2 theorems, 20 equations, 12 figures, 6 tables.

Key Result

Theorem 2.1

Consider a DFFN with the same activation function $\psi$ in all the neurons of all the layers. Moreover let $\psi \in C(\mathbb{R})$. Then the space $\mathcal{M}^{(\mathcal{L})}(\psi)$ generated by a linear combination of the outputs of the last hidden layer $\ul{x}^{(\mathcal{L})}$, defined by: is dense in $C(\mathbb{R})$, in the topology of uniform convergence on compacta, if and only if $\psi$

Figures (12)

  • Figure 1: Schematic representation of the action of the generic $i$--th neuron of the $l$--th layer. The inputs $x_j^{(l-1)}$ are represented with blue circles, the weights $A_{ij}^{(l)}$ and bias $\beta_i^{(l)}$ are represented by yellow circles, the computed values $z_i^{(l)}$ and $x_i^{(l)}$ are represented with red circles and the functions performed by the neuron (i.e. the interaction scheme $\kappa_i^{(l)}$ and the activation function $\psi_i^{(l)}$) are represented with green squares.
  • Figure 2: Schematic representation of the single-hidden layer ANN with scalar input $x^{(0)}$ and scalar output $x^{(2)}$. Each neuron is represented by a circle divided into the interaction scheme $\kappa_i^{(l)}$ and the activation function $\psi_i^{(l)}$. The input neuron, which does not perform any of these operations, is represented by a black dot.
  • Figure 3: Runge's example: computed error, square case $M=N^{(\mathcal{L})}$, where $M$ is the number of nodes and $N^{(\mathcal{L})}$ is the number of neurons in the hidden layer. The left panel is the case of equispaced nodes, the center is the case of Chebychev nodes, and the right is the case of randomly generated nodes. Convergence is tested at different choices of the activation function and increasing $M$ and compared with one of the polynomial interpolation. In this case, the linear problem \ref{['eq:exactness']} is squared.
  • Figure 4: Runge's example: computed error for the derivative, square case $M=N_{\mathcal{L}}$ where $M$ is the number of nodes and $N^{(\mathcal{L})}$ is the number of neurons in the hidden layer. The left panel is the case of equispaced nodes, the central panel is the case of Chebychev nodes, and the right panel is the case of randomly generated nodes at different choices of the activation function. The difference is between the exact derivative of a function $f_R$ in \ref{['eq:runge']} and the computed derivative of the function obtained by interpolation, the one used in Figure \ref{['fig1']}.
  • Figure 5: Runge's example: computed error with respect to the number of interpolating nodes $M$, overparametrized case with $M=N^{(\mathcal{L})}/2$ where $N^{(\mathcal{L})}$ is the number of neurons in the hidden layer. The left panel is the case of equispaced nodes, the central panel is the case of Chebychev nodes, the right panel is the case of randomly generated nodes. The linear problem \ref{['eq:exactness']} is solved by least squares. The black triangle is the reference convergence (interpolating polynomial on Chebychev nodes), as reported in \ref{['eq:errChebRunge']}.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 2.1
  • Theorem 2.2