k-Winners-Take-All Ensemble Neural Network

Abien Fred Agarap; Arnulfo P. Azcarraga

k-Winners-Take-All Ensemble Neural Network

Abien Fred Agarap, Arnulfo P. Azcarraga

TL;DR

The paper addresses improving ensemble neural networks by enabling concurrent cooperative training and combining sub-network outputs with a kWinners-Take-All (kWTA) fusion that induces competition and specialization. The proposed method, kWTA-ENN, concatenates sub-network outputs, passes them through a fully connected layer, and applies a kWTA activation with $k=0.75$ to produce the final output $o$, while sub-networks are trained concurrently. Key contributions include formalizing a cooperative ensemble baseline, introducing the kWTA-based fusion, and demonstrating superior test accuracies on MNIST ($98.34\%$), Fashion-MNIST ($88.06\%$), KMNIST ($91.56\%$), and WDBC ($95.97\%$) with statistical significance. The results indicate that competition among sub-networks leads to partial specialization and mutual knowledge sharing, improving generalization beyond traditional ensembling and offering a practical approach for robust multi-network classifiers.

Abstract

Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.

k-Winners-Take-All Ensemble Neural Network

TL;DR

to produce the final output

, while sub-networks are trained concurrently. Key contributions include formalizing a cooperative ensemble baseline, introducing the kWTA-based fusion, and demonstrating superior test accuracies on MNIST (

), Fashion-MNIST (

), KMNIST (

), and WDBC (

) with statistical significance. The results indicate that competition among sub-networks leads to partial specialization and mutual knowledge sharing, improving generalization beyond traditional ensembling and offering a practical approach for robust multi-network classifiers.

Abstract

Paper Structure (13 sections, 4 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 13 sections, 4 equations, 2 figures, 3 tables, 2 algorithms.

Introduction and Related Works
Ensemble of Independent Networks
Mixture of Experts
Cooperative Ensemble Learning
Competitive Ensemble Learning
Experiments
Datasets Description
Experimental Setup
Hardware and Software Configuration
Training Details
Classification Performance
Improving cooperation through competitive learning
Conclusion and Future Works

Figures (2)

Figure 1: Predictions of each sub-network on a sample MNIST data and their respective final outputs. In \ref{['fig:moe-mnist-per-class-logits']}, we can infer that MoE sub-networks 2 and 3 are specializing on class 1. In \ref{['fig:ensemble-mnist-per-class-logits']}, all CE sub-networks have high probability outputs for class 1. In \ref{['fig:kwta-mnist-per-class-logits']}, all kWTA-ENN sub-networks contributed but with the kWTA activation function, the neurons for other classes were most likely inhibited at inference, thus its higher probability output than MoE and CE.
Figure 2: Predictions of each sub-network on a sample KMNIST data and their respective final outputs. In \ref{['fig:moe-kmnist-per-class-logits']}, we can infer that MoE sub-network 2 is specializing on class 6 ("ma"). In \ref{['fig:ensemble-kmnist-per-class-logits']}, CE sub-network 3 was assisted by sub-network 2. In \ref{['fig:kwta-kmnist-per-class-logits']}, all kWTA-ENN sub-networks contributed but with the kWTA activation function, the neurons for other classes were most likely inhibited at inference, thus its higher probability output than MoE and CE.

k-Winners-Take-All Ensemble Neural Network

TL;DR

Abstract

k-Winners-Take-All Ensemble Neural Network

Authors

TL;DR

Abstract

Table of Contents

Figures (2)