GRASP: group-Shapley feature selection for patients
Yuheng Luo, Shuyan Li, Zhong Cao
TL;DR
GRASP presents an interpretable feature-selection framework for medical prediction by coupling SHAP-based attribution with group-$L_{21}$ regularization in a logistic regression objective, optimized via proximal-gradient methods. It assigns group-aware penalties derived from SHAP-derived importances, enabling stable, non-redundant feature sets while preserving predictive performance. Across NHANES and UK Biobank mortality data, GRASP achieves compact feature sets with high stability, low redundancy, and calibration that aligns with clinical risk thresholds, outperforming or matching existing selectors. The approach enhances interpretability and clinical relevance of predictive models, facilitating reliable risk stratification in real-world healthcare applications.
Abstract
Feature selection remains a major challenge in medical prediction, where existing approaches such as LASSO often lack robustness and interpretability. We introduce GRASP, a novel framework that couples Shapley value driven attribution with group $L_{21}$ regularization to extract compact and non-redundant feature sets. GRASP first distills group level importance scores from a pretrained tree model via SHAP, then enforces structured sparsity through group $L_{21}$ regularized logistic regression, yielding stable and interpretable selections. Extensive comparisons with LASSO, SHAP, and deep learning based methods show that GRASP consistently delivers comparable or superior predictive accuracy, while identifying fewer, less redundant, and more stable features.
