Conformal Recursive Feature Elimination
Marcos López-De-Castro, Alberto García-Galindo, Rubén Armañanzas
TL;DR
The paper tackles the challenge of feature selection under uncertainty in high-dimensional multiclass settings by integrating conformal prediction with recursive feature elimination. It introduces Conformal Recursive Feature Elimination (CRFE), which uses a multiclass One-vs-All adaptation and a per-feature non-conformity measure to recursively drop the most non-conforming features, together with an automatic $\beta$-based stopping criterion and a new consistency index $I_W$. Empirical results on synthetic and real-world datasets show that CRFE often outperforms classical RFE in conformal-set predictions while providing more compact and stable feature subsets; it also demonstrates competitive single-prediction performance and improved consistency. The work advances uncertainty-aware feature selection and provides an open-source implementation for broader adoption and future extensions to nonlinear classifiers and class-balanced conformal approaches.
Abstract
Unlike traditional statistical methods, Conformal Prediction (CP) allows for the determination of valid and accurate confidence levels associated with individual predictions based only on exchangeability of the data. We here introduce a new feature selection method that takes advantage of the CP framework. Our proposal, named Conformal Recursive Feature Elimination (CRFE), identifies and recursively removes features that increase the non-conformity of a dataset. We also present an automatic stopping criterion for CRFE, as well as a new index to measure consistency between subsets of features. CRFE selections are compared to the classical Recursive Feature Elimination (RFE) method on several multiclass datasets by using multiple partitions of the data. The results show that CRFE clearly outperforms RFE in half of the datasets, while achieving similar performance in the rest. The automatic stopping criterion provides subsets of effective and non-redundant features without computing any classification performance.
