Table of Contents
Fetching ...

PRISM: A Framework Harnessing Unsupervised Visual Representations and Textual Prompts for Explainable MACE Survival Prediction from Cardiac Cine MRI

Haoyang Su, Jin-Yi Xiang, Shaohao Rui, Yifan Gao, Xingyu Chen, Tingxuan Yin, Shaoting Zhang, Xiaosong Wang, Lian-Ming Wu

TL;DR

PRISM addresses MACE prediction by fusing unsupervised visual representations from non-contrast cine MRI with structured EHR data in a three-stage survival modeling framework. It combines motion-aware multi-view distillation (Stage I), prompt-guided cross-modal EHR alignment (Stage II), and CoxPH-based fusion for survival prediction (Stage III), with alignment losses ensuring semantic and structural coherence. The approach reveals three spatiotemporal imaging signatures—lateral wall dyssynchrony, inferior hypersensitivity, and anterior diastolic elevated focus—whose patterns map to coronary territories, and identifies hypertension, diabetes, and smoking as key EHR drivers via BiPromptSurv attribution. Across four independent cohorts under IECV, PRISM outperforms classical and SOTA baselines, demonstrating robust generalization and offering interpretable, annotation-free risk stratification for practical cardiovascular prognosis, with the survival model defined by $h(t|\mathbf{x}_{\mathrm{fused}}) = h_0(t) \exp\left(\sum_k \beta_k x_{\mathrm{fused}_k}\right)$.

Abstract

Accurate prediction of major adverse cardiac events (MACE) remains a central challenge in cardiovascular prognosis. We present PRISM (Prompt-guided Representation Integration for Survival Modeling), a self-supervised framework that integrates visual representations from non-contrast cardiac cine magnetic resonance imaging with structured electronic health records (EHRs) for survival analysis. PRISM extracts temporally synchronized imaging features through motion-aware multi-view distillation and modulates them using medically informed textual prompts to enable fine-grained risk prediction. Across four independent clinical cohorts, PRISM consistently surpasses classical survival prediction models and state-of-the-art (SOTA) deep learning baselines under internal and external validation. Further clinical findings demonstrate that the combined imaging and EHR representations derived from PRISM provide valuable insights into cardiac risk across diverse cohorts. Three distinct imaging signatures associated with elevated MACE risk are uncovered, including lateral wall dyssynchrony, inferior wall hypersensitivity, and anterior elevated focus during diastole. Prompt-guided attribution further identifies hypertension, diabetes, and smoking as dominant contributors among clinical and physiological EHR factors.

PRISM: A Framework Harnessing Unsupervised Visual Representations and Textual Prompts for Explainable MACE Survival Prediction from Cardiac Cine MRI

TL;DR

PRISM addresses MACE prediction by fusing unsupervised visual representations from non-contrast cine MRI with structured EHR data in a three-stage survival modeling framework. It combines motion-aware multi-view distillation (Stage I), prompt-guided cross-modal EHR alignment (Stage II), and CoxPH-based fusion for survival prediction (Stage III), with alignment losses ensuring semantic and structural coherence. The approach reveals three spatiotemporal imaging signatures—lateral wall dyssynchrony, inferior hypersensitivity, and anterior diastolic elevated focus—whose patterns map to coronary territories, and identifies hypertension, diabetes, and smoking as key EHR drivers via BiPromptSurv attribution. Across four independent cohorts under IECV, PRISM outperforms classical and SOTA baselines, demonstrating robust generalization and offering interpretable, annotation-free risk stratification for practical cardiovascular prognosis, with the survival model defined by .

Abstract

Accurate prediction of major adverse cardiac events (MACE) remains a central challenge in cardiovascular prognosis. We present PRISM (Prompt-guided Representation Integration for Survival Modeling), a self-supervised framework that integrates visual representations from non-contrast cardiac cine magnetic resonance imaging with structured electronic health records (EHRs) for survival analysis. PRISM extracts temporally synchronized imaging features through motion-aware multi-view distillation and modulates them using medically informed textual prompts to enable fine-grained risk prediction. Across four independent clinical cohorts, PRISM consistently surpasses classical survival prediction models and state-of-the-art (SOTA) deep learning baselines under internal and external validation. Further clinical findings demonstrate that the combined imaging and EHR representations derived from PRISM provide valuable insights into cardiac risk across diverse cohorts. Three distinct imaging signatures associated with elevated MACE risk are uncovered, including lateral wall dyssynchrony, inferior wall hypersensitivity, and anterior elevated focus during diastole. Prompt-guided attribution further identifies hypertension, diabetes, and smoking as dominant contributors among clinical and physiological EHR factors.

Paper Structure

This paper contains 17 sections, 14 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of the cardiac survival analysis dataset featuring non-contrast cine MRI in short and long axis views and patient-level EHRs subcategorized into pharmaceutical, biochemical, physiological, and clinical domains.
  • Figure 2: Design of the proposed framework and pipeline. (a), PRISM's three-stage survival analysis framework. Multi-view cine MRI images passed through a teacher-student distillation network to obtain representation tokens enriched with spatiotemporal features, aligned hierarchically with EHR features under the influence of medically informed prompt aggregation. The refined image features are ultimately aggregated with EHR features in Stage III to yield survival analysis results. (b), Medical insights discovery pipeline based on PRISM. Deriving the association between spatiotemporal patterns of ventricular myocardium and MACE risk from the learned heatmap distributions, and exploring EHR features aiding in MACE assessment based on the BiPromptSurv strategy.
  • Figure 3: Details of model architecture and sub-modules. The Spatial Aggregation Module aggregates latent representations obtained from layer-wise convolutions over cardiac cine MRI sequences. The Semantic Encoder encodes medical guided prompts $\mathcal{T}$ based on pretrained embeddings. The Motion-Aware Encoder incorporates the CBlock and SABlock modules from the UniFormer li2022uniformer backbone.
  • Figure 4: Model survival analysis performance under the IECV setting. Regression analysis in the interval-external cross-validation setting, with model-predicted inverse risk (normalized) on the horizontal axis versus ground-truth survival time (normalized) on the vertical axis. Top panels represent internal validation and bottom panels show external validation in correlated cohorts. Cases with MACE=0 are denoted in black, while those with MACE=1 are indicated in blue.
  • Figure 5: Kaplan–Meier survival plots across four cohorts (GLCCM, AZCCM, RJCCM, TJCCM) and three modeling strategies (PRISM, CoxPH, PCRL). p-value is incorporated as a metric to assess the statistical significance and reliability.
  • ...and 4 more figures