Table of Contents
Fetching ...

Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets

Bozhi Dan, Di Wu, Ji Xu, Xiang Liu, Yiziting Zhu, Xin Shu, Yujie Li, Bin Yi

TL;DR

This work tackles sepsis prediction under the challenges of sparse, heterogeneous EHR data by introducing Triplet-GCN, which encodes encounters as patient–feature–value triplets and propagates information over a bipartite patient–feature graph. A two-layer GCN learning patient embeddings is followed by a compact MLP classifier, with type-specific preprocessing and edge-level value preservation to maintain measurement provenance. On a multi-center Chinese cohort, Triplet-GCN outperforms strong tabular baselines across discrimination and balanced-error metrics, demonstrating improved sepsis risk stratification with practical deployability. The approach offers a principled, end-to-end blueprint for leveraging relational EHR structure to enhance early warning systems in critical care.

Abstract

In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous nature of electronic health record (EHR) data. We propose Triplet-GCN, a single-branch graph convolutional model that represents each encounter as patient-feature-value triplets, constructs a bipartite EHR graph, and learns patient embeddings via a Graph Convolutional Network (GCN) followed by a lightweight multilayer perceptron (MLP). The pipeline applies type-specific preprocessing -- median imputation and standardization for numeric variables, effect coding for binary features, and mode imputation with low-dimensional embeddings for rare categorical attributes -- and initializes patient nodes with summary statistics, while retaining measurement values on edges to preserve "who measured what and by how much". In a retrospective, multi-center Chinese cohort (N = 648; 70/30 train-test split) drawn from three tertiary hospitals, Triplet-GCN consistently outperforms strong tabular baselines (KNN, SVM, XGBoost, Random Forest) across discrimination and balanced error metrics, yielding a more favorable sensitivity-specificity trade-off and improved overall utility for early warning. These findings indicate that encoding EHR as triplets and propagating information over a patient-feature graph produce more informative patient representations than feature-independent models, offering a simple, end-to-end blueprint for deployable sepsis risk stratification.

Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets

TL;DR

This work tackles sepsis prediction under the challenges of sparse, heterogeneous EHR data by introducing Triplet-GCN, which encodes encounters as patient–feature–value triplets and propagates information over a bipartite patient–feature graph. A two-layer GCN learning patient embeddings is followed by a compact MLP classifier, with type-specific preprocessing and edge-level value preservation to maintain measurement provenance. On a multi-center Chinese cohort, Triplet-GCN outperforms strong tabular baselines across discrimination and balanced-error metrics, demonstrating improved sepsis risk stratification with practical deployability. The approach offers a principled, end-to-end blueprint for leveraging relational EHR structure to enhance early warning systems in critical care.

Abstract

In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous nature of electronic health record (EHR) data. We propose Triplet-GCN, a single-branch graph convolutional model that represents each encounter as patient-feature-value triplets, constructs a bipartite EHR graph, and learns patient embeddings via a Graph Convolutional Network (GCN) followed by a lightweight multilayer perceptron (MLP). The pipeline applies type-specific preprocessing -- median imputation and standardization for numeric variables, effect coding for binary features, and mode imputation with low-dimensional embeddings for rare categorical attributes -- and initializes patient nodes with summary statistics, while retaining measurement values on edges to preserve "who measured what and by how much". In a retrospective, multi-center Chinese cohort (N = 648; 70/30 train-test split) drawn from three tertiary hospitals, Triplet-GCN consistently outperforms strong tabular baselines (KNN, SVM, XGBoost, Random Forest) across discrimination and balanced error metrics, yielding a more favorable sensitivity-specificity trade-off and improved overall utility for early warning. These findings indicate that encoding EHR as triplets and propagating information over a patient-feature graph produce more informative patient representations than feature-independent models, offering a simple, end-to-end blueprint for deployable sepsis risk stratification.

Paper Structure

This paper contains 15 sections, 11 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: The GCN model framework.