P-CAFE: Personalized Cost-Aware Incremental Feature Selection For Electronic Health Records
Naama Kashani, Mira Cohen, Uri Shaham
TL;DR
P-CAFE introduces a personalized, cost-aware, online feature selection framework for electronic health records (EHRs) that sequentially reveals features across multiple modalities under a budget. Framing feature selection as a Markov Decision Process, it employs an Agent to pick features and a pre-trained Guesser to provide reward signals that balance information gain with cost, with robust optimization used to stabilize the training dynamics. The approach handles multimodal data (text, images, time-series) and patient-specific sparsity, and supports various RL agents, notably DDQN, with design choices like prioritized replay and Huber loss. Empirical results on MIMIC-III and eICU demonstrate that P-CAFE achieves higher AUC-ROC and AUPRC with lower IoU and cost, while maintaining interpretability and adaptability to real-world budget constraints. Overall, P-CAFE advances cost-effective, personalized clinical decision-support by aligning feature acquisition with predictive value and resource limits.
Abstract
Electronic Health Records (EHR) have revolutionized healthcare by digitizing patient data, improving accessibility, and streamlining clinical workflows. However, extracting meaningful insights from these complex and multimodal datasets remains a significant challenge for researchers. Traditional feature selection methods often struggle with the inherent sparsity and heterogeneity of EHR data, especially when accounting for patient-specific variations and feature costs in clinical applications. To address these challenges, we propose a novel personalized, online and cost-aware feature selection framework tailored specifically for EHR datasets. The features are aquired in an online fashion for individual patients, incorporating budgetary constraints and feature variability costs. The framework is designed to effectively manage sparse and multimodal data, ensuring robust and scalable performance in diverse healthcare contexts. A primary application of our proposed method is to support physicians' decision making in patient screening scenarios. By guiding physicians toward incremental acquisition of the most informative features within budget constraints, our approach aims to increase diagnostic confidence while optimizing resource utilization.
