Table of Contents
Fetching ...

P-CAFE: Personalized Cost-Aware Incremental Feature Selection For Electronic Health Records

Naama Kashani, Mira Cohen, Uri Shaham

TL;DR

P-CAFE introduces a personalized, cost-aware, online feature selection framework for electronic health records (EHRs) that sequentially reveals features across multiple modalities under a budget. Framing feature selection as a Markov Decision Process, it employs an Agent to pick features and a pre-trained Guesser to provide reward signals that balance information gain with cost, with robust optimization used to stabilize the training dynamics. The approach handles multimodal data (text, images, time-series) and patient-specific sparsity, and supports various RL agents, notably DDQN, with design choices like prioritized replay and Huber loss. Empirical results on MIMIC-III and eICU demonstrate that P-CAFE achieves higher AUC-ROC and AUPRC with lower IoU and cost, while maintaining interpretability and adaptability to real-world budget constraints. Overall, P-CAFE advances cost-effective, personalized clinical decision-support by aligning feature acquisition with predictive value and resource limits.

Abstract

Electronic Health Records (EHR) have revolutionized healthcare by digitizing patient data, improving accessibility, and streamlining clinical workflows. However, extracting meaningful insights from these complex and multimodal datasets remains a significant challenge for researchers. Traditional feature selection methods often struggle with the inherent sparsity and heterogeneity of EHR data, especially when accounting for patient-specific variations and feature costs in clinical applications. To address these challenges, we propose a novel personalized, online and cost-aware feature selection framework tailored specifically for EHR datasets. The features are aquired in an online fashion for individual patients, incorporating budgetary constraints and feature variability costs. The framework is designed to effectively manage sparse and multimodal data, ensuring robust and scalable performance in diverse healthcare contexts. A primary application of our proposed method is to support physicians' decision making in patient screening scenarios. By guiding physicians toward incremental acquisition of the most informative features within budget constraints, our approach aims to increase diagnostic confidence while optimizing resource utilization.

P-CAFE: Personalized Cost-Aware Incremental Feature Selection For Electronic Health Records

TL;DR

P-CAFE introduces a personalized, cost-aware, online feature selection framework for electronic health records (EHRs) that sequentially reveals features across multiple modalities under a budget. Framing feature selection as a Markov Decision Process, it employs an Agent to pick features and a pre-trained Guesser to provide reward signals that balance information gain with cost, with robust optimization used to stabilize the training dynamics. The approach handles multimodal data (text, images, time-series) and patient-specific sparsity, and supports various RL agents, notably DDQN, with design choices like prioritized replay and Huber loss. Empirical results on MIMIC-III and eICU demonstrate that P-CAFE achieves higher AUC-ROC and AUPRC with lower IoU and cost, while maintaining interpretability and adaptability to real-world budget constraints. Overall, P-CAFE advances cost-effective, personalized clinical decision-support by aligning feature acquisition with predictive value and resource limits.

Abstract

Electronic Health Records (EHR) have revolutionized healthcare by digitizing patient data, improving accessibility, and streamlining clinical workflows. However, extracting meaningful insights from these complex and multimodal datasets remains a significant challenge for researchers. Traditional feature selection methods often struggle with the inherent sparsity and heterogeneity of EHR data, especially when accounting for patient-specific variations and feature costs in clinical applications. To address these challenges, we propose a novel personalized, online and cost-aware feature selection framework tailored specifically for EHR datasets. The features are aquired in an online fashion for individual patients, incorporating budgetary constraints and feature variability costs. The framework is designed to effectively manage sparse and multimodal data, ensuring robust and scalable performance in diverse healthcare contexts. A primary application of our proposed method is to support physicians' decision making in patient screening scenarios. By guiding physicians toward incremental acquisition of the most informative features within budget constraints, our approach aims to increase diagnostic confidence while optimizing resource utilization.

Paper Structure

This paper contains 42 sections, 3 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: P-CAFE framework applied to patient-specific cases of the MIMIC-III Multi-Modal Dataset. The progression through the feature selection stages is online and tailored to each patient, with predictions of in-hospital mortality displayed on the right to reflect personalized outcomes.
  • Figure 2: The P-CAFE architecture. At each step, the agent reveals a feature and updates its internal state accordingly, receiving a Gain-Based reward. This process repeats until the agent attains sufficient confidence to predict the outcome, triggering the guesser to make a prediction based on the revealed features and receive a Guess-Based reward.
  • Figure 3: Performance Comparison of P-CAFE and LSPIN
  • Figure 4: Performance comparison of P-CAFE against CFS, IG, MI, and CST methods, as reported in zuo2021curvature.
  • Figure 5: Clinically Interpretable Feature Acquisition in Diabetes Diagnosis