Table of Contents
Fetching ...

Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data

Youngro Lee, Giacomo Baruzzo, Jeonghwan Kim, Jongmo Seo, Barbara Di Camillo

TL;DR

It is demonstrated that even low-performing models can provide reliable feature importance on biomedical datasets and that the validity of feature importance can be preserved even at suboptimal model performance levels, as long as the degradation stems from insufficient features rather than insufficient samples.

Abstract

In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some datasets exhibit the opposite. When feature interactions are controlled by removing correlations, feature cutting consistently shows better stability. By analyzing the distribution of feature importance values and theoretically examining the probability that the model cannot distinguish feature importance between features, we reveal that models can still distinguish feature importance despite performance degradation through feature cutting, but not through data cutting. We conclude that the validity of feature importance can be maintained even at low performance levels if the data size is adequate, which is a significant factor contributing to suboptimal performance in tabular medical data analysis. This paper demonstrates the potential for utilizing feature importance analysis alongside statistical analysis to compare features relatively, even when classifier performance is not satisfactory.

Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data

TL;DR

It is demonstrated that even low-performing models can provide reliable feature importance on biomedical datasets and that the validity of feature importance can be preserved even at suboptimal model performance levels, as long as the degradation stems from insufficient features rather than insufficient samples.

Abstract

In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some datasets exhibit the opposite. When feature interactions are controlled by removing correlations, feature cutting consistently shows better stability. By analyzing the distribution of feature importance values and theoretically examining the probability that the model cannot distinguish feature importance between features, we reveal that models can still distinguish feature importance despite performance degradation through feature cutting, but not through data cutting. We conclude that the validity of feature importance can be maintained even at low performance levels if the data size is adequate, which is a significant factor contributing to suboptimal performance in tabular medical data analysis. This paper demonstrates the potential for utilizing feature importance analysis alongside statistical analysis to compare features relatively, even when classifier performance is not satisfactory.
Paper Structure (17 sections, 5 equations, 6 figures, 1 table)

This paper contains 17 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Graphical explanation for overall experiments a) Overall structure how the performance and stability indexes are obtained. b) Algorithm for data cutting c) Schematic for trimming results, d) Schematic of linear regression to compare two degradation algorithms (Section 3.1) e) Schematic of linear regression to analyze the effect of deleting correlations (Section 3.2) f) Schematic of feature importance distribution analysis (Section 3.3)
  • Figure 2: Comparison between data cutting and feature cutting regarding four stability indexes in experiment on synthetic datasets. x-axis refers to AUC value and y axis to stability index. Red vertical line refers to x (AUC) = 0.8 to denote the point where the stability index starts to rapidly change.
  • Figure 3: Comparison of data cutting and feature cutting regarding SRCC and CD in experiment on real dataset 1 and 5. Other stability indexes also show statistical significance, illustrated in Appendix 7 x-axis refers to AUC value and y axis to stability index.
  • Figure 4: Stability differences between feature cutting from data cutting after deleting correlated features higher than max correlation. x-axis represents performance value (AUC), and y-axis represents stability index from feature cutting subtracted with the same index from data cutting. Each color represents maximum correlation among the feature group.
  • Figure 5: Feature importance distribution of generated dataset 1. In box plot (a, b, c), X-axis represents feature rank; y-axis represents value of feature importance (Gini impurity). Black dots show outliers from box plot. Feature importance of each feature for a) entire data samples and features, b) only with 10 samples (AUC=0.68), c) only with 10 features (AUC=0.62). d) Ratio of relationships where the feature importance values of adjacent features are distinguished by statistical significance ($\textit{p}<0.05$).
  • ...and 1 more figures