Greedy feature selection: Classifier-dependent feature selection via greedy methods
Fabiana Camattari, Sabrina Guastavino, Francesco Marchetti, Michele Piana, Emma Perracchione
TL;DR
This work introduces classifier-dependent greedy feature selection for classification, addressing the limitations of classifier-agnostic feature ranking by enabling wrapper-style, model-specific feature ranking. The authors establish theoretical guarantees on expressiveness via the $VC$-dimension and kernel alignment, and implement a robust stopping rule using the True Skill Statistic. They validate the approach on synthetic data and a solar-physics dataset for geo-effectiveness prediction, showing significant improvements for SVM-based classification when using greedily selected features and highlighting physically meaningful predictors such as $B_z$ and solar wind velocity. The results suggest practical benefits for feature reduction and open avenues for future work including activity-phase analyses and integration with physics-informed neural networks.
Abstract
The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, greedy feature selection identifies the most important feature at each step and according to the selected classifier. In the paper, the benefits of such scheme are investigated theoretically in terms of model capacity indicators, such as the Vapnik-Chervonenkis (VC) dimension or the kernel alignment, and tested numerically by considering its application to the problem of predicting geo-effective manifestations of the active Sun.
