Table of Contents
Fetching ...

Conformal Recursive Feature Elimination

Marcos López-De-Castro, Alberto García-Galindo, Rubén Armañanzas

TL;DR

The paper tackles the challenge of feature selection under uncertainty in high-dimensional multiclass settings by integrating conformal prediction with recursive feature elimination. It introduces Conformal Recursive Feature Elimination (CRFE), which uses a multiclass One-vs-All adaptation and a per-feature non-conformity measure to recursively drop the most non-conforming features, together with an automatic $\beta$-based stopping criterion and a new consistency index $I_W$. Empirical results on synthetic and real-world datasets show that CRFE often outperforms classical RFE in conformal-set predictions while providing more compact and stable feature subsets; it also demonstrates competitive single-prediction performance and improved consistency. The work advances uncertainty-aware feature selection and provides an open-source implementation for broader adoption and future extensions to nonlinear classifiers and class-balanced conformal approaches.

Abstract

Unlike traditional statistical methods, Conformal Prediction (CP) allows for the determination of valid and accurate confidence levels associated with individual predictions based only on exchangeability of the data. We here introduce a new feature selection method that takes advantage of the CP framework. Our proposal, named Conformal Recursive Feature Elimination (CRFE), identifies and recursively removes features that increase the non-conformity of a dataset. We also present an automatic stopping criterion for CRFE, as well as a new index to measure consistency between subsets of features. CRFE selections are compared to the classical Recursive Feature Elimination (RFE) method on several multiclass datasets by using multiple partitions of the data. The results show that CRFE clearly outperforms RFE in half of the datasets, while achieving similar performance in the rest. The automatic stopping criterion provides subsets of effective and non-redundant features without computing any classification performance.

Conformal Recursive Feature Elimination

TL;DR

The paper tackles the challenge of feature selection under uncertainty in high-dimensional multiclass settings by integrating conformal prediction with recursive feature elimination. It introduces Conformal Recursive Feature Elimination (CRFE), which uses a multiclass One-vs-All adaptation and a per-feature non-conformity measure to recursively drop the most non-conforming features, together with an automatic -based stopping criterion and a new consistency index . Empirical results on synthetic and real-world datasets show that CRFE often outperforms classical RFE in conformal-set predictions while providing more compact and stable feature subsets; it also demonstrates competitive single-prediction performance and improved consistency. The work advances uncertainty-aware feature selection and provides an open-source implementation for broader adoption and future extensions to nonlinear classifiers and class-balanced conformal approaches.

Abstract

Unlike traditional statistical methods, Conformal Prediction (CP) allows for the determination of valid and accurate confidence levels associated with individual predictions based only on exchangeability of the data. We here introduce a new feature selection method that takes advantage of the CP framework. Our proposal, named Conformal Recursive Feature Elimination (CRFE), identifies and recursively removes features that increase the non-conformity of a dataset. We also present an automatic stopping criterion for CRFE, as well as a new index to measure consistency between subsets of features. CRFE selections are compared to the classical Recursive Feature Elimination (RFE) method on several multiclass datasets by using multiple partitions of the data. The results show that CRFE clearly outperforms RFE in half of the datasets, while achieving similar performance in the rest. The automatic stopping criterion provides subsets of effective and non-redundant features without computing any classification performance.
Paper Structure (24 sections, 24 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 24 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Set prediction performance metrics. The results are averaged over 20 train-test iterations for each feature selection method. Standard deviation is provided as upper and lower intervals. Plots a-(a), b-(a), c-(a), and d-(a) show results by the RFE method, whereas Plots a-(b), b-(b), c-(b), and d-(b) presents results by CRFE.
  • Figure 2: Single-prediction performance metrics for the synthetic dataset. Standards deviations are provided as upper and lower intervals. Plots a-(a) and b-(a) show accuracy, precision, and recall performance metrics achieved by subsets of features selected by RFE and CRFE, respectively. Plots a-(b),(c),(d) and b-(b),(c),(d) show precision, recall and F1 score performance by class achieved by RFE and CRFE, respectively.
  • Figure 3: Consistency analysis across the 20 iterations. Plots a-(a), b-(a), c-(a) and d-(a) show consistency results for RFE, whereas Plots a-(b), b-(b), c-(b) and d-(b) present CRFE consistency results. The Jaccard $I_J$, and the new proposed consistency index $I_W$ are shown.
  • Figure 4: The frequency with which each feature was included in the optimal subset of features selected by CRFE using the $\beta$-based stopping criterion. Note the maximum corresponds to 50 independent runs. The percentages of features always discarded (not shown) were 0%, 28.2%, 11.7%, and 13.5% of the total sets for synthetic, coronary artery disease, dermatology, and myocardial datasets, respectively.
  • Figure 5: Single-prediction performance metrics for the coronary artery disease dataset. Standard deviations were provided as upper and lower intervals. Plots a-(a) and b-(a) show accuracy, precision, and recall performance metrics achieved by subsets of features selected by RFE and CRFE respectively. Plots a-(b),(c),(d) and b-(b),(c),(d) show precision, recall, and F1 score performance by class achieved by RFE and CRFE, respectively.
  • ...and 4 more figures