Greedy feature selection: Classifier-dependent feature selection via greedy methods

Fabiana Camattari; Sabrina Guastavino; Francesco Marchetti; Michele Piana; Emma Perracchione

Greedy feature selection: Classifier-dependent feature selection via greedy methods

Fabiana Camattari, Sabrina Guastavino, Francesco Marchetti, Michele Piana, Emma Perracchione

TL;DR

This work introduces classifier-dependent greedy feature selection for classification, addressing the limitations of classifier-agnostic feature ranking by enabling wrapper-style, model-specific feature ranking. The authors establish theoretical guarantees on expressiveness via the $VC$-dimension and kernel alignment, and implement a robust stopping rule using the True Skill Statistic. They validate the approach on synthetic data and a solar-physics dataset for geo-effectiveness prediction, showing significant improvements for SVM-based classification when using greedily selected features and highlighting physically meaningful predictors such as $B_z$ and solar wind velocity. The results suggest practical benefits for feature reduction and open avenues for future work including activity-phase analyses and integration with physics-informed neural networks.

Abstract

The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, greedy feature selection identifies the most important feature at each step and according to the selected classifier. In the paper, the benefits of such scheme are investigated theoretically in terms of model capacity indicators, such as the Vapnik-Chervonenkis (VC) dimension or the kernel alignment, and tested numerically by considering its application to the problem of predicting geo-effective manifestations of the active Sun.

Greedy feature selection: Classifier-dependent feature selection via greedy methods

TL;DR

-dimension and kernel alignment, and implement a robust stopping rule using the True Skill Statistic. They validate the approach on synthetic data and a solar-physics dataset for geo-effectiveness prediction, showing significant improvements for SVM-based classification when using greedily selected features and highlighting physically meaningful predictors such as

and solar wind velocity. The results suggest practical benefits for feature reduction and open avenues for future work including activity-phase analyses and integration with physics-informed neural networks.

Abstract

Paper Structure (12 sections, 5 theorems, 34 equations, 4 tables)

This paper contains 12 sections, 5 theorems, 34 equations, 4 tables.

Introduction
Greedy feature ranking schemes
The VC dimension in the greedy framework
SVM in the greedy framework
Stopping criterion
Numerical experiments
Applications to a toy dataset
Applications to solar physics: geo-effectiveness prediction
The dataset and the models
Greedy feature selection approaches
Prediction of geo-effective solar events with greedy-selected features
Conclusions and future work

Key Result

Proposition 1

$X$ is shattered by $\mathcal{F}^{(k)}$ if and only if $\iota_{\alpha}(\pi_{k}(X))$ is shattered by $\mathcal{F}^{(k)}$.

Theorems & Definitions (13)

Definition 1
Definition 2
Remark 1
Proposition 1
proof
Proposition 2
proof
Corollary 1
proof
Theorem 3
...and 3 more

Greedy feature selection: Classifier-dependent feature selection via greedy methods

TL;DR

Abstract

Greedy feature selection: Classifier-dependent feature selection via greedy methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (13)