Table of Contents
Fetching ...

Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing

Rochana Chaturvedi, Yue Zhou, Andrew Boyd, Brian T. Layden, Mudassir Rashid, Lu Cheng, Ali Cinar, Barbara Di Eugenio

TL;DR

The paper tackles the challenge of predicting near-term Type 2 Diabetes risk from longitudinal clinical notes by introducing two complementary approaches: HiT-Gnn, a Hierarchical Temporal Graph Neural Network that models fine-grained intra-note and inter-visit temporal relations augmented with UMLS knowledge, and ReVeAL, a verifier-aided, test-time framework that distills reasoning from a large LLM into a smaller verifier for efficient, interpretable predictions. The methods are validated on both private (PH) and public (MIMIC-IV) corpora with rigorous pre-diagnosis data curation and fairness analyses, showing HiT-Gnn achieves superior predictive performance, especially in near-term horizons, while ReVeAL improves sensitivity and preserves reasoning. Ablation studies confirm the value of temporal structure and knowledge integration, and the fairness analysis highlights biases that require careful mitigation for equitable deployment. The work demonstrates a practical, privacy-preserving path for longitudinal note-based risk prediction and lays groundwork for extending temporally grounded, interpretable NLP to other chronic diseases and cross-institutional settings.

Abstract

Clinical notes in Electronic Health Records (EHRs) capture rich temporal information on events, clinician reasoning, and lifestyle factors often missing from structured data. Leveraging them for predictive modeling can be impactful for timely identification of chronic diseases. However, they present core natural language processing (NLP) challenges: long text, irregular event distribution, complex temporal dependencies, privacy constraints, and resource limitations. We present two complementary methods for temporally and contextually grounded risk prediction from longitudinal notes. First, we introduce HiTGNN, a hierarchical temporal graph neural network that integrates intra-note temporal event structures, inter-visit dynamics, and medical knowledge to model patient trajectories with fine-grained temporal granularity. Second, we propose ReVeAL, a lightweight, test-time framework that distills the reasoning of large language models into smaller verifier models. Applied to opportunistic screening for Type 2 Diabetes (T2D) using temporally realistic cohorts curated from private and public hospital corpora, HiTGNN achieves the highest predictive accuracy, especially for near-term risk, while preserving privacy and limiting reliance on large proprietary models. ReVeAL enhances sensitivity to true T2D cases and retains explanatory reasoning. Our ablations confirm the value of temporal structure and knowledge augmentation, and fairness analysis shows HiTGNN performs more equitably across subgroups.

Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing

TL;DR

The paper tackles the challenge of predicting near-term Type 2 Diabetes risk from longitudinal clinical notes by introducing two complementary approaches: HiT-Gnn, a Hierarchical Temporal Graph Neural Network that models fine-grained intra-note and inter-visit temporal relations augmented with UMLS knowledge, and ReVeAL, a verifier-aided, test-time framework that distills reasoning from a large LLM into a smaller verifier for efficient, interpretable predictions. The methods are validated on both private (PH) and public (MIMIC-IV) corpora with rigorous pre-diagnosis data curation and fairness analyses, showing HiT-Gnn achieves superior predictive performance, especially in near-term horizons, while ReVeAL improves sensitivity and preserves reasoning. Ablation studies confirm the value of temporal structure and knowledge integration, and the fairness analysis highlights biases that require careful mitigation for equitable deployment. The work demonstrates a practical, privacy-preserving path for longitudinal note-based risk prediction and lays groundwork for extending temporally grounded, interpretable NLP to other chronic diseases and cross-institutional settings.

Abstract

Clinical notes in Electronic Health Records (EHRs) capture rich temporal information on events, clinician reasoning, and lifestyle factors often missing from structured data. Leveraging them for predictive modeling can be impactful for timely identification of chronic diseases. However, they present core natural language processing (NLP) challenges: long text, irregular event distribution, complex temporal dependencies, privacy constraints, and resource limitations. We present two complementary methods for temporally and contextually grounded risk prediction from longitudinal notes. First, we introduce HiTGNN, a hierarchical temporal graph neural network that integrates intra-note temporal event structures, inter-visit dynamics, and medical knowledge to model patient trajectories with fine-grained temporal granularity. Second, we propose ReVeAL, a lightweight, test-time framework that distills the reasoning of large language models into smaller verifier models. Applied to opportunistic screening for Type 2 Diabetes (T2D) using temporally realistic cohorts curated from private and public hospital corpora, HiTGNN achieves the highest predictive accuracy, especially for near-term risk, while preserving privacy and limiting reliance on large proprietary models. ReVeAL enhances sensitivity to true T2D cases and retains explanatory reasoning. Our ablations confirm the value of temporal structure and knowledge augmentation, and fairness analysis shows HiTGNN performs more equitably across subgroups.

Paper Structure

This paper contains 41 sections, 4 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Demographic distribution in PH and MIMIC-IV test sets.
  • Figure 2: HiT-Gnn Architecture: Hierarchical Temporal GNN that models intra- and inter-document temporal dependencies between clinical entities and integrates UMLS knowledge for type 2 diabetes (T2D) risk prediction.
  • Figure 3: AUC as a function of the prediction horizon, evaluated over consecutive 3-month windows.
  • Figure 4: HiT-Gnn performance ablations.
  • Figure 5: Prompt for identifying mention of type 2 diabetes in the initial set of pre-diagnosis notes identified based on ICD codes.
  • ...and 6 more figures