Multi-task Heterogeneous Graph Learning on Electronic Health Records
Tsai Hor Chan, Guosheng Yin, Kyongtae Bae, Lequan Yu
TL;DR
MulT-EHR addresses the challenges of heterogeneous, noisy electronic health records by modeling them as a heterogeneous graph and applying a causal denoising module to mitigate confounding. The framework uses a Transformer-based heterogeneous GNN backbone with self-supervised TransE pretraining and a multi-task environment-invariant objective to share knowledge across four clinical prediction tasks. Empirical results on MIMIC-III and MIMIC-IV show consistent improvements over state-of-the-art baselines across mortality, readmission, length of stay, and drug recommendation, with ablations confirming the contribution of each component. The approach demonstrates strong potential for interpretable, generalizable EHR representations and could extend to other domains that involve heterogeneous graphs and multi-task learning.
Abstract
Learning electronic health records (EHRs) has received emerging attention because of its capability to facilitate accurate medical diagnosis. Since the EHRs contain enriched information specifying complex interactions between entities, modeling EHRs with graphs is shown to be effective in practice. The EHRs, however, present a great degree of heterogeneity, sparsity, and complexity, which hamper the performance of most of the models applied to them. Moreover, existing approaches modeling EHRs often focus on learning the representations for a single task, overlooking the multi-task nature of EHR analysis problems and resulting in limited generalizability across different tasks. In view of these limitations, we propose a novel framework for EHR modeling, namely MulT-EHR (Multi-Task EHR), which leverages a heterogeneous graph to mine the complex relations and model the heterogeneity in the EHRs. To mitigate the large degree of noise, we introduce a denoising module based on the causal inference framework to adjust for severe confounding effects and reduce noise in the EHR data. Additionally, since our model adopts a single graph neural network for simultaneous multi-task prediction, we design a multi-task learning module to leverage the inter-task knowledge to regularize the training process. Extensive empirical studies on MIMIC-III and MIMIC-IV datasets validate that the proposed method consistently outperforms the state-of-the-art designs in four popular EHR analysis tasks -- drug recommendation, and predictions of the length of stay, mortality, and readmission. Thorough ablation studies demonstrate the robustness of our method upon variations to key components and hyperparameters.
