Table of Contents
Fetching ...

k-Winners-Take-All Ensemble Neural Network

Abien Fred Agarap, Arnulfo P. Azcarraga

TL;DR

The paper addresses improving ensemble neural networks by enabling concurrent cooperative training and combining sub-network outputs with a kWinners-Take-All (kWTA) fusion that induces competition and specialization. The proposed method, kWTA-ENN, concatenates sub-network outputs, passes them through a fully connected layer, and applies a kWTA activation with $k=0.75$ to produce the final output $o$, while sub-networks are trained concurrently. Key contributions include formalizing a cooperative ensemble baseline, introducing the kWTA-based fusion, and demonstrating superior test accuracies on MNIST ($98.34\%$), Fashion-MNIST ($88.06\%$), KMNIST ($91.56\%$), and WDBC ($95.97\%$) with statistical significance. The results indicate that competition among sub-networks leads to partial specialization and mutual knowledge sharing, improving generalization beyond traditional ensembling and offering a practical approach for robust multi-network classifiers.

Abstract

Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.

k-Winners-Take-All Ensemble Neural Network

TL;DR

The paper addresses improving ensemble neural networks by enabling concurrent cooperative training and combining sub-network outputs with a kWinners-Take-All (kWTA) fusion that induces competition and specialization. The proposed method, kWTA-ENN, concatenates sub-network outputs, passes them through a fully connected layer, and applies a kWTA activation with to produce the final output , while sub-networks are trained concurrently. Key contributions include formalizing a cooperative ensemble baseline, introducing the kWTA-based fusion, and demonstrating superior test accuracies on MNIST (), Fashion-MNIST (), KMNIST (), and WDBC () with statistical significance. The results indicate that competition among sub-networks leads to partial specialization and mutual knowledge sharing, improving generalization beyond traditional ensembling and offering a practical approach for robust multi-network classifiers.

Abstract

Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.
Paper Structure (13 sections, 4 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 13 sections, 4 equations, 2 figures, 3 tables, 2 algorithms.

Figures (2)

  • Figure 1: Predictions of each sub-network on a sample MNIST data and their respective final outputs. In \ref{['fig:moe-mnist-per-class-logits']}, we can infer that MoE sub-networks 2 and 3 are specializing on class 1. In \ref{['fig:ensemble-mnist-per-class-logits']}, all CE sub-networks have high probability outputs for class 1. In \ref{['fig:kwta-mnist-per-class-logits']}, all kWTA-ENN sub-networks contributed but with the kWTA activation function, the neurons for other classes were most likely inhibited at inference, thus its higher probability output than MoE and CE.
  • Figure 2: Predictions of each sub-network on a sample KMNIST data and their respective final outputs. In \ref{['fig:moe-kmnist-per-class-logits']}, we can infer that MoE sub-network 2 is specializing on class 6 ("ma"). In \ref{['fig:ensemble-kmnist-per-class-logits']}, CE sub-network 3 was assisted by sub-network 2. In \ref{['fig:kwta-kmnist-per-class-logits']}, all kWTA-ENN sub-networks contributed but with the kWTA activation function, the neurons for other classes were most likely inhibited at inference, thus its higher probability output than MoE and CE.