Table of Contents
Fetching ...

A Critical Analysis of the Theoretical Framework of the Extreme Learning Machine

Irina Perfilievaa, Nicolas Madrid, Manuel Ojeda-Aciego, Piotr Artiemjew, Agnieszka Niemczynowicz

TL;DR

The paper critically reexamines Huang06’s theoretical foundation for the Extreme Learning Machine (ELM), arguing that key proofs are mathematically flawed and that exact interpolation cannot be guaranteed under the original randomization assumptions. It constructs a counterexample dataset $S$ (with $N=400$) showing failure of the claimed results and dissects the erroneous steps in Theorems 2.1 and 2.2, including incorrect implications of differentiating activation relationships. It then proposes a corrected theoretical direction (Theorem teNew and Corollaries) that places restrictions on the data and activation function to obtain valid probabilistic guarantees, and demonstrates that ELM can still exactly learn the counterexample dataset under alternative hyperparameters (e.g., ReLU with $ ilde N=8000$). The findings highlight the need for a rigorous, conditionally valid mathematical framework for ELM and suggest practical modifications to the randomization scheme to preserve usefulness while ensuring theoretical coherence. Overall, the work clarifies the limits of Huang06’s claims and offers a pathway toward a sound, probabilistic foundation for ELM with explicit conditions on activation, data, and network size.

Abstract

Despite the number of successful applications of the Extreme Learning Machine (ELM), we show that its underlying foundational principles do not have a rigorous mathematical justification. Specifically, we refute the proofs of two main statements, and we also create a dataset that provides a counterexample to the ELM learning algorithm and explain its design, which leads to many such counterexamples. Finally, we provide alternative statements of the foundations, which justify the efficiency of ELM in some theoretical cases.

A Critical Analysis of the Theoretical Framework of the Extreme Learning Machine

TL;DR

The paper critically reexamines Huang06’s theoretical foundation for the Extreme Learning Machine (ELM), arguing that key proofs are mathematically flawed and that exact interpolation cannot be guaranteed under the original randomization assumptions. It constructs a counterexample dataset (with ) showing failure of the claimed results and dissects the erroneous steps in Theorems 2.1 and 2.2, including incorrect implications of differentiating activation relationships. It then proposes a corrected theoretical direction (Theorem teNew and Corollaries) that places restrictions on the data and activation function to obtain valid probabilistic guarantees, and demonstrates that ELM can still exactly learn the counterexample dataset under alternative hyperparameters (e.g., ReLU with ). The findings highlight the need for a rigorous, conditionally valid mathematical framework for ELM and suggest practical modifications to the randomization scheme to preserve usefulness while ensuring theoretical coherence. Overall, the work clarifies the limits of Huang06’s claims and offers a pathway toward a sound, probabilistic foundation for ELM with explicit conditions on activation, data, and network size.

Abstract

Despite the number of successful applications of the Extreme Learning Machine (ELM), we show that its underlying foundational principles do not have a rigorous mathematical justification. Specifically, we refute the proofs of two main statements, and we also create a dataset that provides a counterexample to the ELM learning algorithm and explain its design, which leads to many such counterexamples. Finally, we provide alternative statements of the foundations, which justify the efficiency of ELM in some theoretical cases.

Paper Structure

This paper contains 13 sections, 2 theorems, 12 equations, 5 figures.

Key Result

Theorem 1

For a given standard SLFN with $N$ distinct input-output pairs $(\bf{x_i},t_i)$ (where $\mathbf x_i\in\mathbb{R}^n$ and $\mathbf t_i\in\mathbb{R}^m$) and $N$ hidden nodes, where it is true that the interiorInterior with respect to the standard topology of $\mathbb{R}^{N(n+1)}$, i.e., $\bf x\in \mathbb{R}^{N(n+1)}$ is in the interior of a set $S\subseteq \mathbb{R}^{N(n+1)}$ if there exist a ball

Figures (5)

  • Figure 1: Graphical representation of the structure of a Single-hidden Layer Feed-forward Neural network.
  • Figure 2: Data set $S$ of $N=400$ different training samples for the ELM learning algorithm in two versions: pointwise (left) and piecewise connected (right).
  • Figure 3: Raw data from the set $S$ used for training (blue), and calculated output values (red) resulting from multiple runs with the ELM learning algorithm trained for $\tilde{N}= 400$ (left) and $\tilde{N}= 50$ (right) hidden nodes. Visually, both (left and right) calculated outputs (red) are far from those (blue) used for training.
  • Figure 4: Two graphs illustrate the results of several runs with randomly selected weights and biases of the ELM learning algorithm trained on hidden nodes of $\tilde{N}= 400$ (left) and $\tilde{N}= 50$ (right) on Example \ref{['Ex3']} with dataset $S$ consisting of $N=400$ training pairs. Visually calculated results are close to zero.
  • Figure 5: Two plots illustrating the results of several runs with randomly selected weights and biases of the ELM learning algorithm with the ReLU activation function trained on a dataset $S$ consisting of $N=400$ training pairs. In the case of $\tilde{N}= 400$ hidden nodes (left), this ELM cannot reproduce all $N=400$ samples of dataset $S$. In the $\tilde{N}= 8000$ case (right), all $N=400$ samples of the $S$ dataset were successfully reproduced.

Theorems & Definitions (4)

  • Remark 1
  • Example 1
  • Theorem 1
  • Corollary 1