Unveiling the Power of Sparse Neural Networks for Feature Selection
Zahra Atashgahi, Tennison Liu, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu, Mihaela van der Schaar
TL;DR
The paper investigates sparse neural networks (SNNs) trained with dynamic sparse training (DST) for feature selection, addressing the choice of DST algorithm and feature-ranking metrics. It introduces a neuron-attribution-based feature importance metric and systematically compares DST-based SNNs to dense networks across 18 datasets, reporting substantial memory and FLOPs reductions (>50% and >55%) with competitive or superior feature quality. The study finds that SET generally outperforms RigL for feature selection, and that the proposed Attr metric often surpasses traditional neuron-strength metrics, though dataset characteristics influence outcomes. The work demonstrates practical efficiency gains and provides a foundation for integrating attribution-based signals into DST regrowth, with code available on GitHub to support reproducibility and further exploration.
Abstract
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. Leveraging the dynamic sparse training (DST) algorithms within SNNs has demonstrated promising feature selection capabilities while drastically reducing computational overheads. Despite these advancements, several critical aspects remain insufficiently explored for feature selection. Questions persist regarding the choice of the DST algorithm for network training, the choice of metric for ranking features/neurons, and the comparative performance of these methods across diverse datasets when compared to dense networks. This paper addresses these gaps by presenting a comprehensive systematic analysis of feature selection with sparse neural networks. Moreover, we introduce a novel metric considering sparse neural network characteristics, which is designed to quantify feature importance within the context of SNNs. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50\%$ memory and $55\%$ FLOPs reduction compared to the dense networks, while outperforming them in terms of the quality of the selected features. Our code and the supplementary material are available on GitHub (\url{https://github.com/zahraatashgahi/Neuron-Attribution}).
