Table of Contents
Fetching ...

In-Hospital Stroke Prediction from PPG-Derived Hemodynamic Features

Jiaming Liu, Cheng Ding, Daoqiang Zhang

TL;DR

This study tackles the lack of pre-stroke physiological data by focusing on in-hospital stroke patients under continuous monitoring, enabling the first large-scale analysis of pre-onset PPG. It combines an LLM-assisted onset-anchoring pipeline with hemodynamic feature extraction from PPG and a ResNet-1D classifier to predict impending stroke up to 6 hours before onset, achieving high F1-scores and strong cross-dataset generalization. The findings show that PPG carries predictive signatures of stroke, with interpretable hemodynamic cues (e.g., relative systolic timing) driving the model, and demonstrate potential for proactive, non-invasive surveillance to improve patient outcomes. Limitations include subtype granularity and the need for prospective validation, but the work establishes a data-centric framework for translating passively collected physiological signals into actionable pre-stroke warnings.

Abstract

The absence of pre-hospital physiological data in standard clinical datasets fundamentally constrains the early prediction of stroke, as patients typically present only after stroke has occurred, leaving the predictive value of continuous monitoring signals such as photoplethysmography (PPG) unvalidated. In this work, we overcome this limitation by focusing on a rare but clinically critical cohort - patients who suffered stroke during hospitalization while already under continuous monitoring - thereby enabling the first large-scale analysis of pre-stroke PPG waveforms aligned to verified onset times. Using MIMIC-III and MC-MED, we develop an LLM-assisted data mining pipeline to extract precise in-hospital stroke onset timestamps from unstructured clinical notes, followed by physician validation, identifying 176 patients (MIMIC) and 158 patients (MC-MED) with high-quality synchronized pre-onset PPG data, respectively. We then extract hemodynamic features from PPG and employ a ResNet-1D model to predict impending stroke across multiple early-warning horizons. The model achieves F1-scores of 0.7956, 0.8759, and 0.9406 at 4, 5, and 6 hours prior to onset on MIMIC-III, and, without re-tuning, reaches 0.9256, 0.9595, and 0.9888 on MC-MED for the same horizons. These results provide the first empirical evidence from real-world clinical data that PPG contains predictive signatures of stroke several hours before onset, demonstrating that passively acquired physiological signals can support reliable early warning, supporting a shift from post-event stroke recognition to proactive, physiology-based surveillance that may materially improve patient outcomes in routine clinical care.

In-Hospital Stroke Prediction from PPG-Derived Hemodynamic Features

TL;DR

This study tackles the lack of pre-stroke physiological data by focusing on in-hospital stroke patients under continuous monitoring, enabling the first large-scale analysis of pre-onset PPG. It combines an LLM-assisted onset-anchoring pipeline with hemodynamic feature extraction from PPG and a ResNet-1D classifier to predict impending stroke up to 6 hours before onset, achieving high F1-scores and strong cross-dataset generalization. The findings show that PPG carries predictive signatures of stroke, with interpretable hemodynamic cues (e.g., relative systolic timing) driving the model, and demonstrate potential for proactive, non-invasive surveillance to improve patient outcomes. Limitations include subtype granularity and the need for prospective validation, but the work establishes a data-centric framework for translating passively collected physiological signals into actionable pre-stroke warnings.

Abstract

The absence of pre-hospital physiological data in standard clinical datasets fundamentally constrains the early prediction of stroke, as patients typically present only after stroke has occurred, leaving the predictive value of continuous monitoring signals such as photoplethysmography (PPG) unvalidated. In this work, we overcome this limitation by focusing on a rare but clinically critical cohort - patients who suffered stroke during hospitalization while already under continuous monitoring - thereby enabling the first large-scale analysis of pre-stroke PPG waveforms aligned to verified onset times. Using MIMIC-III and MC-MED, we develop an LLM-assisted data mining pipeline to extract precise in-hospital stroke onset timestamps from unstructured clinical notes, followed by physician validation, identifying 176 patients (MIMIC) and 158 patients (MC-MED) with high-quality synchronized pre-onset PPG data, respectively. We then extract hemodynamic features from PPG and employ a ResNet-1D model to predict impending stroke across multiple early-warning horizons. The model achieves F1-scores of 0.7956, 0.8759, and 0.9406 at 4, 5, and 6 hours prior to onset on MIMIC-III, and, without re-tuning, reaches 0.9256, 0.9595, and 0.9888 on MC-MED for the same horizons. These results provide the first empirical evidence from real-world clinical data that PPG contains predictive signatures of stroke several hours before onset, demonstrating that passively acquired physiological signals can support reliable early warning, supporting a shift from post-event stroke recognition to proactive, physiology-based surveillance that may materially improve patient outcomes in routine clinical care.
Paper Structure (18 sections, 3 equations, 6 figures, 3 tables)

This paper contains 18 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the proposed framework. The pipeline comprises three phases: (1) Temporal Anchoring: LLM-driven extraction of precise stroke onset timestamps from unstructured clinical notes; (2) Feature Engineering: Derivation of hemodynamic biomarkers from PPG waveforms and their derivatives; and (3) Predictive Modeling: A ResNet-1D network for early stroke warning, validated on internal (MIMIC-III) and external (MC-MED) cohorts.
  • Figure 2: Temporal Labeling Strategy. Timeline aligned to stroke onset ($t=0$) with exclusion buffers to mitigate label noise and prevent leakage.
  • Figure 3: ROC Analysis across internal and external cohorts. The internal evaluation (solid lines) demonstrates a distinct temporal gradient ($6h > 5h > 4h$).
  • Figure 4: Cross-Dataset SHAP Analysis. Relative Systolic Peak Time ($T_{sp, Rel}$) dominates prediction across cohorts, followed by $CV_{T,pi}$ and $A_{sp, Rel}$. (SHAP values scaled by $10^3$).
  • Figure 5: Trajectories of Leading Contributing Factors. Visual validation of the leading contributing factors identified by SHAP. Approaching Stroke Onset, the Relative Systolic Peak Time ($T_{sp, Rel}$) exhibits an upward drift, while the absolute Systolic Peak Time ($T_{sp}$) and Relative Systolic Amplitude ($A_{sp, Rel}$) show a synchronized downward trend in the final phase. This inverse relationship confirms that the model relies on multidimensional physiological deterioration to predict stroke onset.
  • ...and 1 more figures