Unveiling the Power of Sparse Neural Networks for Feature Selection

Zahra Atashgahi; Tennison Liu; Mykola Pechenizkiy; Raymond Veldhuis; Decebal Constantin Mocanu; Mihaela van der Schaar

Unveiling the Power of Sparse Neural Networks for Feature Selection

Zahra Atashgahi, Tennison Liu, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu, Mihaela van der Schaar

TL;DR

The paper investigates sparse neural networks (SNNs) trained with dynamic sparse training (DST) for feature selection, addressing the choice of DST algorithm and feature-ranking metrics. It introduces a neuron-attribution-based feature importance metric and systematically compares DST-based SNNs to dense networks across 18 datasets, reporting substantial memory and FLOPs reductions (>50% and >55%) with competitive or superior feature quality. The study finds that SET generally outperforms RigL for feature selection, and that the proposed Attr metric often surpasses traditional neuron-strength metrics, though dataset characteristics influence outcomes. The work demonstrates practical efficiency gains and provides a foundation for integrating attribution-based signals into DST regrowth, with code available on GitHub to support reproducibility and further exploration.

Abstract

Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. Leveraging the dynamic sparse training (DST) algorithms within SNNs has demonstrated promising feature selection capabilities while drastically reducing computational overheads. Despite these advancements, several critical aspects remain insufficiently explored for feature selection. Questions persist regarding the choice of the DST algorithm for network training, the choice of metric for ranking features/neurons, and the comparative performance of these methods across diverse datasets when compared to dense networks. This paper addresses these gaps by presenting a comprehensive systematic analysis of feature selection with sparse neural networks. Moreover, we introduce a novel metric considering sparse neural network characteristics, which is designed to quantify feature importance within the context of SNNs. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50\%$ memory and $55\%$ FLOPs reduction compared to the dense networks, while outperforming them in terms of the quality of the selected features. Our code and the supplementary material are available on GitHub (\url{https://github.com/zahraatashgahi/Neuron-Attribution}).

Unveiling the Power of Sparse Neural Networks for Feature Selection

TL;DR

Abstract

memory and

FLOPs reduction compared to the dense networks, while outperforming them in terms of the quality of the selected features. Our code and the supplementary material are available on GitHub (\url{https://github.com/zahraatashgahi/Neuron-Attribution}).

Paper Structure (26 sections, 4 equations, 8 figures, 9 tables)

This paper contains 26 sections, 4 equations, 8 figures, 9 tables.

Backgound & Related Work
Feature Selection
Problem Formulation
Neuron Attribution
Sparse Neural Networks
Dynamic Sparse Training (DST)
DST for Feature Selection
Methodology
Input Neuron Importance
Results
Experimental Setup
Datasets
Baselines
Evaluation
Feature Selection Comparison
...and 11 more sections

Figures (8)

Figure 1: Neuron Attribution visualization in a sparse neural network. The contribution of each input feature for any output neuron is measured by neuron attribution methods. Darker colors show a higher contribution of the corresponding input neuron to the output neuron.
Figure 2: Feature selection using sparse neural network. The importance of each input neuron can be measured using the network's characteristics. Darker colors and larger neurons show a higher importance of the corresponding input neuron.
Figure 3: A toy example of neuron attribution-based importance calculation. Darker colors indicate higher contributions of the corresponding input neuron.
Figure 4: Neuron importance visualization on the MNIST dataset as 2d-heat maps. The lighter area in the center of the Figures shows more important features which is in-line with the pattern of digits in the MNIST dataset.
Figure 5: Accuracy vs. FLOPs comparison among various neural network-based feature selection methods inducing sparsity in the network. The FLOPs values are divided by $10^{12}$
...and 3 more figures

Unveiling the Power of Sparse Neural Networks for Feature Selection

TL;DR

Abstract

Unveiling the Power of Sparse Neural Networks for Feature Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)