Table of Contents
Fetching ...

Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

Pei Liu, Luping Ji, Jiaxiang Gou, Bo Fu, Mao Ye

TL;DR

This work introduces Vision-Language Survival Analysis (VLSA), a VL-based framework for prognostic analysis on gigapixel histopathology WSIs. By integrating language-encoded prognostic priors, ordinal survival prompts, and an ordinal incidence function, VLSA guides weakly-supervised MIL and yields interpretable predictions via Shapley-value attributions. Across five TCGA datasets, VLSA achieves state-of-the-art concordance indices and improved distribution calibration while demonstrating strong data efficiency, particularly in few-shot settings. The approach leverages pathology VL foundation models to provide data-efficient learning and descriptive language-based interpretation, potentially transforming CPATH survival analysis and MIL-based prognosis tasks.

Abstract

Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive network architectures and only coarse-grained patient-level labels to learn visual prognostic representations from gigapixel WSIs. Such learning paradigm suffers from critical performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes textual prognostic prior and then employs it as auxiliary signals to guide the aggregating of visual prognostic features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs. Our source code is available at https://github.com/liupei101/VLSA.

Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

TL;DR

This work introduces Vision-Language Survival Analysis (VLSA), a VL-based framework for prognostic analysis on gigapixel histopathology WSIs. By integrating language-encoded prognostic priors, ordinal survival prompts, and an ordinal incidence function, VLSA guides weakly-supervised MIL and yields interpretable predictions via Shapley-value attributions. Across five TCGA datasets, VLSA achieves state-of-the-art concordance indices and improved distribution calibration while demonstrating strong data efficiency, particularly in few-shot settings. The approach leverages pathology VL foundation models to provide data-efficient learning and descriptive language-based interpretation, potentially transforming CPATH survival analysis and MIL-based prognosis tasks.

Abstract

Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive network architectures and only coarse-grained patient-level labels to learn visual prognostic representations from gigapixel WSIs. Such learning paradigm suffers from critical performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes textual prognostic prior and then employs it as auxiliary signals to guide the aggregating of visual prognostic features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs. Our source code is available at https://github.com/liupei101/VLSA.
Paper Structure (37 sections, 23 equations, 7 figures, 11 tables)

This paper contains 37 sections, 23 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Overview of Vision-Language Survival Analysis ($\textsc{VLSA}$). (a) WSI representation learning with language-encoded prognostic priors (Section \ref{['subsec31']}). (b) Ordinal survival prompt learning (Section \ref{['subsec32']}). (c) Prediction of ordinal incidence function (Section \ref{['subsec33']}). The survival prediction of $\textsc{VLSA}$ can be interpreted by quantifying each prognostic prior's contribution to risk (Section \ref{['subsec35']}).
  • Figure 2: Interpreting the survival prediction of $\textsc{VLSA}$ via language-encoded prognostic priors. Top row shows language descriptions (simplified for better view) about prognostic visual features in WSIs. Detailed texts are provided in Appendix \ref{['apx:sec44']}. Middle row gives the most representative patches corresponding to each prognostic text. Last row presents each prognostic prior's contribution to risk. We mainly examine the top three language priors in terms of contribution. The three examples are from the first three datasets. More results are shown in Appendix \ref{['apx:sec52']}.
  • Figure 3: Ordinality visualization. (a) Heatmap of the similarity between any two learned survival prompts. Its horizontal axis places the first prompt to the last prompt from left to right; its vertical one does so from top to bottom. Acc is the accuracy of prompt ranking. The results are from the first three datasets. More results are shown in Appendix \ref{['apx:sec52']}. (b) Predictive probability (Proba.) w/o and w/ ordinality. The patients from test set are used for prediction. $\Delta$EMD is equal to $\text{EMD}({\bm{y}},\hat{{\bm{y}}}_\text{w/o}) - \text{EMD}({\bm{y}},\hat{{\bm{y}}}_\text{w/})$. Vertical dashed line indicates individual time-to-event.
  • Figure 4: An example to illustrate the case that incidence function is not ordinal whereas prompts are. $L_{c,c+1}$ is the bisector of the angle between ${\bm{f}}_\text{text}^{c}$ and ${\bm{f}}_\text{text}^{c+1}$. When ${\bm{f}}_\text{image}$ falls into the gray area, $\text{cos}({\bm{f}}_\text{image},{\bm{f}}_\text{text}^{c+1})> \text{cos}({\bm{f}}_\text{image},{\bm{f}}_\text{text}^{c+2})$ does not hold.
  • Figure 6: Risk Grouping and Kaplan-Meier Analysis. All patients within each dataset are grouped into two risk groups: low-risk (blue) and high-risk (orange). Patients' risk predictions are derived from $\textsc{VLSA}$. The median risk of the entire cohort is adopted as the cutoff for risk grouping.
  • ...and 2 more figures