Table of Contents
Fetching ...

Bridging the gap between Performance and Interpretability: An Explainable Disentangled Multimodal Framework for Cancer Survival Prediction

Aniek Eijpe, Soufyan Lakbir, Melis Erdal Cesur, Sara P. Oliveira, Angelos Chatzimparmpas, Sanne Abeln, Wilson Silva

TL;DR

DIMAFx is introduced, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data, showing that multimodal models can overcome the traditional trade-off between performance and explainability.

Abstract

While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.

Bridging the gap between Performance and Interpretability: An Explainable Disentangled Multimodal Framework for Cancer Survival Prediction

TL;DR

DIMAFx is introduced, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data, showing that multimodal models can overcome the traditional trade-off between performance and explainability.

Abstract

While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.
Paper Structure (33 sections, 4 equations, 14 figures, 8 tables)

This paper contains 33 sections, 4 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Overview of DIMAFx. Our proposed framework encodes each modality with interpretable features, then applies disentangled attention fusion and aggregation to produce four disentangled representations. The representations are used for the downstream tasks, including survival prediction. Both unimodal and multimodal feature importance is assessed using SHAP values, and the attention matrices from the self- and cross-attention layers provide insights into the intra- and inter-modal interactions.
  • Figure 2: Kaplan–Meier survival curves for DIMAFx, showing high- and low-risk test patient groups stratified by the median predicted risk score, with corresponding hazard ratios and p-values.
  • Figure 3: Unimodal interpretability analysis of DIMAFx. (A) SHAP values for the top unimodal features, ranked by mean absolute SHAP value and colored by predicted log2 risk. (B) Visualization of the five most important WSI prototype features. Per feature, the cardinality (c) of the prototype is displayed, indicating their average frequency across all test samples, and two patches representing the prototype are shown, one associated with a high-risk prediction and one with a low-risk prediction. W9: High-risk patch shows solid sheets of tumor cells with dark, hyperchromatic nuclei infiltrating adipocytes (arrows); low-risk patch shows occasional tubule formation (arrowheads) and lower nuclear grade. W0: High-risk patch exhibits solid growth, marked pleomorphism, vesicular chromatin, and tumor necrosis (arrows); low-risk patch shows occasional gland formation (arrowheads), more uniform cells, and lower nuclear atypia. W10: High-risk patch shows solid tumor nests with marked atypia (arrowheads) in stroma containing few immune cells; low-risk patch shows tubular structures (arrows) with low nuclear atypia and sparse immune cells. W3: High-risk patch shows solid tumor nests with basophilic vacuolated cytoplasm, pleomorphic nuclei (arrowheads), and apoptotic cells; low-risk patch shows gland formation (arrow) with lower nuclear atypia. W8: High-risk patch shows solid nests with abundant cytoplasm, vesicular nuclei (arrowheads), and marked atypia; low-risk patch shows solid growth with lower nuclear atypia (arrowheads), suggesting lower histologic grade. (C) Visualization of the five most important transcriptomic pathway features, illustrating how the model-assigned feature risk (SHAP values) varies with the mean log-transformed RSEM-normalized pathway expression.
  • Figure 4: Normalized SHAP values for modality-shared versus modality-specific features showing that shared features generally contribute more to survival prediction. WSI-derived features are colored red, and transcriptomic pathway-derived features are colored blue. The dotted line indicates when the SHAP of the modality-specific feature is equal to the SHAP of the modality-shared feature.
  • Figure 5: Multimodal interpretability analysis of DIMAFx for the Shared W8 feature, representing a solid-pattern tumor WSI feature contextualized by transcriptomic pathway information. (A) The most highly attended pathway features. (B) Interaction between the W8 and R13 features across all test samples. Each point represents a test case, with SHAP values for R13 (x-axis) and W8 (y-axis) reflecting the model-assigned feature risk, and is colored by the final predicted risk. (C) Visualization of the W8 and R13 features for four representative cases. The ridge plots show the frequency distributions of log transformed RSEM normalized gene expression values for the "R13: Estrogen response late" pathway in the highlighted case (green) and averaged over all train samples (gray). For the "W8: Tumor, solid pattern, abundant cytoplasm" feature, we show a representative patch and the cardinality (c) of this morphological prototype from each case. Case 168: the W8 prototype shows a tumor with a solid growth pattern with focal gland formation (arrow), large vacuolated cytoplasm (arrowheads), and high-grade nuclear atypia with conspicuous nucleoli. Case 76: tumor cells forming solid sheets with vague tubule formation (arrowheads) in a collagenous stroma, with scant cytoplasm. Case 52: tumor cells forming solid sheets with vesicular chromatin and conspicuous nucleoli (arrowheads). Case 132: glandular growth pattern (arrows) with more uniform cells and lower nuclear atypia.
  • ...and 9 more figures