Table of Contents
Fetching ...

Predicting clinical outcomes from patient care pathways represented with temporal knowledge graphs

Jong Ho Jhee, Alberto Megina, Pacôme Constant Dit Beaufils, Matilde Karakachoff, Richard Redon, Alban Gaignard, Adrien Coulet

TL;DR

The paper investigates whether knowledge graph representations of patient care pathways can improve clinical outcome prediction for ruptured intracranial aneurysms. It compares tabular baselines with graph-based embeddings (TransE, RDF2Vec, RGCN+Lit) across SPHN and CARE-SM schemas and various time modeling choices, finding that RGCN+Lit on SPHN yields the best performance. A publicly released synthetic dataset and transformation scripts enable reproducibility, and results highlight the value of compact patient-centric schemas and literal-aware embeddings while showing time encoding has a nuanced effect. The work points to practical potential for KG-based predictive tools in healthcare, while noting limitations such as class imbalance and the need for clinical validation.

Abstract

Background: With the increasing availability of healthcare data, predictive modeling finds many applications in the biomedical domain, such as the evaluation of the level of risk for various conditions, which in turn can guide clinical decision making. However, it is unclear how knowledge graph data representations and their embedding, which are competitive in some settings, could be of interest in biomedical predictive modeling. Method: We simulated synthetic but realistic data of patients with intracranial aneurysm and experimented on the task of predicting their clinical outcome. We compared the performance of various classification approaches on tabular data versus a graph-based representation of the same data. Next, we investigated how the adopted schema for representing first individual data and second temporal data impacts predictive performances. Results: Our study illustrates that in our case, a graph representation and Graph Convolutional Network (GCN) embeddings reach the best performance for a predictive task from observational data. We emphasize the importance of the adopted schema and of the consideration of literal values in the representation of individual data. Our study also moderates the relative impact of various time encoding on GCN performance.

Predicting clinical outcomes from patient care pathways represented with temporal knowledge graphs

TL;DR

The paper investigates whether knowledge graph representations of patient care pathways can improve clinical outcome prediction for ruptured intracranial aneurysms. It compares tabular baselines with graph-based embeddings (TransE, RDF2Vec, RGCN+Lit) across SPHN and CARE-SM schemas and various time modeling choices, finding that RGCN+Lit on SPHN yields the best performance. A publicly released synthetic dataset and transformation scripts enable reproducibility, and results highlight the value of compact patient-centric schemas and literal-aware embeddings while showing time encoding has a nuanced effect. The work points to practical potential for KG-based predictive tools in healthcare, while noting limitations such as class imbalance and the need for clinical validation.

Abstract

Background: With the increasing availability of healthcare data, predictive modeling finds many applications in the biomedical domain, such as the evaluation of the level of risk for various conditions, which in turn can guide clinical decision making. However, it is unclear how knowledge graph data representations and their embedding, which are competitive in some settings, could be of interest in biomedical predictive modeling. Method: We simulated synthetic but realistic data of patients with intracranial aneurysm and experimented on the task of predicting their clinical outcome. We compared the performance of various classification approaches on tabular data versus a graph-based representation of the same data. Next, we investigated how the adopted schema for representing first individual data and second temporal data impacts predictive performances. Results: Our study illustrates that in our case, a graph representation and Graph Convolutional Network (GCN) embeddings reach the best performance for a predictive task from observational data. We emphasize the importance of the adopted schema and of the consideration of literal values in the representation of individual data. Our study also moderates the relative impact of various time encoding on GCN performance.

Paper Structure

This paper contains 19 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Example of individual clinical data represented in a SPHN knowledge graph. Temporal information is specified with RDF literals associated to events through sphn:hasStartDateTime properties.
  • Figure 2: Example of individual clinical data represented in a CARE-SM KG. Temporal information is directly linked to the Context#1 named graph with start and end dates. Multiple events can be associated to a given timeline through the sio:is-part-of property.
  • Figure 3: A visual representation of care pathways where the larger connections between care events correspond to the higher transition probabilities. morphine: morphine use, paracetamol: paracetamol use, corotrop: milrinone use, ATL: percutaneous transluminal angioplasty, NAD: norepinephrin use, nimodipine: nimodipine use, IOT: orotracheal intubation, and DVE: external ventricular drainage.
  • Figure 4: Examples of temporal information: (a) two event associated with a timestamp; (b) sequence of events related with time:before relations. In plain lines are relations between directly subsequent events. Applying a transitivity rule once, add 2 relations in dashed doted line (saturation 1) and twice, add the last relation depicted with the dash-dotted line (saturation 2). iot: orotracheal intubation, dve: external ventricular drainage, nad: nicotinamide adenine dinucleotide.
  • Figure 5: An illustration of the model denoted RGCN+lit for patient outcome prediction.