Table of Contents
Fetching ...

KAT-GNN: A Knowledge-Augmented Temporal Graph Neural Network for Risk Prediction in Electronic Health Records

Kun-Wei Lin, Yu-Chen Kuo, Hsin-Yao Wang, Yi-Ju Tseng

TL;DR

KAT-GNN addresses risk prediction from heterogeneous and irregular EHR data by constructing modality-specific graphs augmented with SNOMED CT ontology edges and data-driven co-occurrence priors. A time-aware transformer then models longitudinal dynamics and adaptively fuses modality representations to produce risk scores. Across CAD and in-hospital mortality tasks on CGRD and MIMIC-III/IV, KAT-GNN achieves state-of-the-art performance and its gains are attributed to both knowledge augmentation and temporal modeling. The approach demonstrates strong generalizability and offers a scalable framework for integrating clinical knowledge with graph-based temporal representations in EHR risk prediction.

Abstract

Clinical risk prediction using electronic health records (EHRs) is vital to facilitate timely interventions and clinical decision support. However, modeling heterogeneous and irregular temporal EHR data presents significant challenges. We propose \textbf{KAT-GNN} (Knowledge-Augmented Temporal Graph Neural Network), a graph-based framework that integrates clinical knowledge and temporal dynamics for risk prediction. KAT-GNN first constructs modality-specific patient graphs from EHRs. These graphs are then augmented using two knowledge sources: (1) ontology-driven edges derived from SNOMED CT and (2) co-occurrence priors extracted from EHRs. Subsequently, a time-aware transformer is employed to capture longitudinal dynamics from the graph-encoded patient representations. KAT-GNN is evaluated on three distinct datasets and tasks: coronary artery disease (CAD) prediction using the Chang Gung Research Database (CGRD) and in-hospital mortality prediction using the MIMIC-III and MIMIC-IV datasets. KAT-GNN achieves state-of-the-art performance in CAD prediction (AUROC: 0.9269 $\pm$ 0.0029) and demonstrated strong results in mortality prediction in MIMIC-III (AUROC: 0.9230 $\pm$ 0.0070) and MIMIC-IV (AUROC: 0.8849 $\pm$ 0.0089), consistently outperforming established baselines such as GRASP and RETAIN. Ablation studies confirm that both knowledge-based augmentation and the temporal modeling component are significant contributors to performance gains. These findings demonstrate that the integration of clinical knowledge into graph representations, coupled with a time-aware attention mechanism, provides an effective and generalizable approach for risk prediction across diverse clinical tasks and datasets.

KAT-GNN: A Knowledge-Augmented Temporal Graph Neural Network for Risk Prediction in Electronic Health Records

TL;DR

KAT-GNN addresses risk prediction from heterogeneous and irregular EHR data by constructing modality-specific graphs augmented with SNOMED CT ontology edges and data-driven co-occurrence priors. A time-aware transformer then models longitudinal dynamics and adaptively fuses modality representations to produce risk scores. Across CAD and in-hospital mortality tasks on CGRD and MIMIC-III/IV, KAT-GNN achieves state-of-the-art performance and its gains are attributed to both knowledge augmentation and temporal modeling. The approach demonstrates strong generalizability and offers a scalable framework for integrating clinical knowledge with graph-based temporal representations in EHR risk prediction.

Abstract

Clinical risk prediction using electronic health records (EHRs) is vital to facilitate timely interventions and clinical decision support. However, modeling heterogeneous and irregular temporal EHR data presents significant challenges. We propose \textbf{KAT-GNN} (Knowledge-Augmented Temporal Graph Neural Network), a graph-based framework that integrates clinical knowledge and temporal dynamics for risk prediction. KAT-GNN first constructs modality-specific patient graphs from EHRs. These graphs are then augmented using two knowledge sources: (1) ontology-driven edges derived from SNOMED CT and (2) co-occurrence priors extracted from EHRs. Subsequently, a time-aware transformer is employed to capture longitudinal dynamics from the graph-encoded patient representations. KAT-GNN is evaluated on three distinct datasets and tasks: coronary artery disease (CAD) prediction using the Chang Gung Research Database (CGRD) and in-hospital mortality prediction using the MIMIC-III and MIMIC-IV datasets. KAT-GNN achieves state-of-the-art performance in CAD prediction (AUROC: 0.9269 0.0029) and demonstrated strong results in mortality prediction in MIMIC-III (AUROC: 0.9230 0.0070) and MIMIC-IV (AUROC: 0.8849 0.0089), consistently outperforming established baselines such as GRASP and RETAIN. Ablation studies confirm that both knowledge-based augmentation and the temporal modeling component are significant contributors to performance gains. These findings demonstrate that the integration of clinical knowledge into graph representations, coupled with a time-aware attention mechanism, provides an effective and generalizable approach for risk prediction across diverse clinical tasks and datasets.

Paper Structure

This paper contains 35 sections, 14 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of KAT-GNN framework. The KAT-GNN framework comprises four stages: preprocessing, graph construction, edge augmentation, and time-aware graph learning and fusion.
  • Figure 2: Illustration of patient-specific graph construction. The table on the left represents diagnosis records across multiple visits, where a value of 1 indicates that the patient was diagnosed with the corresponding CCS code during that visit, and 0 indicates absence. Each unique diagnosis and visit is represented as a node, with diagnosis nodes shown in blue and visit nodes shown in pink. Black edges connect diagnoses to visits according to the EHR table, forming the fundamental bipartite structure of the graph. Gray dashed edges link consecutive visits to model the patient's temporal progression. Red edges indicate additional semantic connections introduced through knowledge augmentation, derived from external ontologies or co-occurrence statistics, linking clinically related diagnosis nodes to enrich the graph structure.
  • Figure 3: Architecture of local and global time-aware attention. The module processes two distinct inputs: (1) the visit-time node embeddings, which are derived from visit-time nodes in the graph, representing temporal positions of visits, and (2) the temporal embeddings, which are generated from time intervals to the index date.