PRISM: Mitigating EHR Data Sparsity via Learning from Missing Feature Calibrated Prototype Patient Representations
Yinghao Zhu, Zixiang Wang, Long He, Shiyun Xie, Xiaochen Zheng, Liantao Ma, Chengwei Pan
TL;DR
PRISM tackles the sparsity inherent in time-series EHR data by shifting from direct data imputation to learning prototype-based representations of similar patients, guided by a feature confidence learner. The framework calibrates feature reliability via missing-status signals and employs a confidence-aware similarity measure to form prototype cohorts, which are fused with individual patient representations before prediction. On four real-world datasets across in-hospital mortality and 30-day readmission tasks, PRISM achieves statistically significant improvements over state-of-the-art baselines, with notable gains in AUPRC and robust performance under high missingness. The approach offers a scalable, interpretable pathway to leverage sparse EHR data for accurate clinical predictions, and provides publicly available code to foster reproducibility and further research.
Abstract
Electronic Health Records (EHRs) contain a wealth of patient data; however, the sparsity of EHRs data often presents significant challenges for predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data by leveraging prototype representations of similar patients, thus ensuring compact representations that preserve patient information. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, PRISM introduces a new patient similarity metric that accounts for feature confidence, avoiding over-reliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have publicly released the code at https://github.com/yhzhu99/PRISM.
