k-Winners-Take-All Ensemble Neural Network
Abien Fred Agarap, Arnulfo P. Azcarraga
TL;DR
The paper addresses improving ensemble neural networks by enabling concurrent cooperative training and combining sub-network outputs with a kWinners-Take-All (kWTA) fusion that induces competition and specialization. The proposed method, kWTA-ENN, concatenates sub-network outputs, passes them through a fully connected layer, and applies a kWTA activation with $k=0.75$ to produce the final output $o$, while sub-networks are trained concurrently. Key contributions include formalizing a cooperative ensemble baseline, introducing the kWTA-based fusion, and demonstrating superior test accuracies on MNIST ($98.34\%$), Fashion-MNIST ($88.06\%$), KMNIST ($91.56\%$), and WDBC ($95.97\%$) with statistical significance. The results indicate that competition among sub-networks leads to partial specialization and mutual knowledge sharing, improving generalization beyond traditional ensembling and offering a practical approach for robust multi-network classifiers.
Abstract
Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.
