Table of Contents
Fetching ...

$M^{2}$Fusion: Bayesian-based Multimodal Multi-level Fusion on Colorectal Cancer Microsatellite Instability Prediction

Quan Liu, Jiawen Yao, Lisha Yao, Xin Chen, Jingren Zhou, Le Lu, Ling Zhang, Zaiyi Liu, Yuankai Huo

TL;DR

This work tackles colorectal cancer MSI prediction by integrating pathology WSIs and CT radiology data through a Bayesian-based multimodal multi-level fusion framework, $M^{2}$Fusion. It combines decision-level and radiology-guided feature-level fusion, using pathology via CLAM + ResNet-18 and radiology via a 2.5D CT input with a ResNet-18 backbone, and fuses embeddings with either ViT-S or MLP backbones. On 5-fold cross-validation with 352 cases, the approach achieves an overall AUC of $0.8177$, outperforming uni-modal and prior fusion strategies. The main contributions include (1) first multi-level fusion of WSI and CT for CRC MSI prediction, (2) integration of CT into CRC MSI multimodal fusion, and (3) evaluation of feature-level fusion with Transformer and CNN backbones, demonstrating improved robustness and accuracy for practical MSI screening and treatment planning.

Abstract

Colorectal cancer (CRC) micro-satellite instability (MSI) prediction on histopathology images is a challenging weakly supervised learning task that involves multi-instance learning on gigapixel images. To date, radiology images have proven to have CRC MSI information and efficient patient imaging techniques. Different data modalities integration offers the opportunity to increase the accuracy and robustness of MSI prediction. Despite the progress in representation learning from the whole slide images (WSI) and exploring the potential of making use of radiology data, CRC MSI prediction remains a challenge to fuse the information from multiple data modalities (e.g., pathology WSI and radiology CT image). In this paper, we propose $M^{2}$Fusion: a Bayesian-based multimodal multi-level fusion pipeline for CRC MSI. The proposed fusion model $M^{2}$Fusion is capable of discovering more novel patterns within and across modalities that are beneficial for predicting MSI than using a single modality alone, as well as other fusion methods. The contribution of the paper is three-fold: (1) $M^{2}$Fusion is the first pipeline of multi-level fusion on pathology WSI and 3D radiology CT image for MSI prediction; (2) CT images are the first time integrated into multimodal fusion for CRC MSI prediction; (3) feature-level fusion strategy is evaluated on both Transformer-based and CNN-based method. Our approach is validated on cross-validation of 352 cases and outperforms either feature-level (0.8177 vs. 0.7908) or decision-level fusion strategy (0.8177 vs. 0.7289) on AUC score.

$M^{2}$Fusion: Bayesian-based Multimodal Multi-level Fusion on Colorectal Cancer Microsatellite Instability Prediction

TL;DR

This work tackles colorectal cancer MSI prediction by integrating pathology WSIs and CT radiology data through a Bayesian-based multimodal multi-level fusion framework, Fusion. It combines decision-level and radiology-guided feature-level fusion, using pathology via CLAM + ResNet-18 and radiology via a 2.5D CT input with a ResNet-18 backbone, and fuses embeddings with either ViT-S or MLP backbones. On 5-fold cross-validation with 352 cases, the approach achieves an overall AUC of , outperforming uni-modal and prior fusion strategies. The main contributions include (1) first multi-level fusion of WSI and CT for CRC MSI prediction, (2) integration of CT into CRC MSI multimodal fusion, and (3) evaluation of feature-level fusion with Transformer and CNN backbones, demonstrating improved robustness and accuracy for practical MSI screening and treatment planning.

Abstract

Colorectal cancer (CRC) micro-satellite instability (MSI) prediction on histopathology images is a challenging weakly supervised learning task that involves multi-instance learning on gigapixel images. To date, radiology images have proven to have CRC MSI information and efficient patient imaging techniques. Different data modalities integration offers the opportunity to increase the accuracy and robustness of MSI prediction. Despite the progress in representation learning from the whole slide images (WSI) and exploring the potential of making use of radiology data, CRC MSI prediction remains a challenge to fuse the information from multiple data modalities (e.g., pathology WSI and radiology CT image). In this paper, we propose Fusion: a Bayesian-based multimodal multi-level fusion pipeline for CRC MSI. The proposed fusion model Fusion is capable of discovering more novel patterns within and across modalities that are beneficial for predicting MSI than using a single modality alone, as well as other fusion methods. The contribution of the paper is three-fold: (1) Fusion is the first pipeline of multi-level fusion on pathology WSI and 3D radiology CT image for MSI prediction; (2) CT images are the first time integrated into multimodal fusion for CRC MSI prediction; (3) feature-level fusion strategy is evaluated on both Transformer-based and CNN-based method. Our approach is validated on cross-validation of 352 cases and outperforms either feature-level (0.8177 vs. 0.7908) or decision-level fusion strategy (0.8177 vs. 0.7289) on AUC score.
Paper Structure (12 sections, 5 equations, 3 figures, 2 tables)

This paper contains 12 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Our proposed $M^{2}$Fusion model. Multimodal data, WSI, and CT images are preprocessed to pathology image patches and CT tumor ROI, respectively. Embeddings are extracted by encoder $E_{p}$ and $E_{r}$. $*$ means the model is well-trained and frozen in pipeline training. $\mathcal{P}_{P}$ is the pathology uni-model performance $\mathcal{P}(P_{ath})$. $P_{R}$ is the radiology uni-model performance $\mathcal{P}(R_{ad})$. $\mathcal{P}_{F}$ is the feature level fusion model probability distribution under pathology and radiology guidance $\mathcal{P}(F_{ea}|P_{ath}R_{ad})$. The final fusion model by $P_{P}$, $P_{R}$ and $P_{F}$ is $\mathcal{P}(F_{ea}P_{ath}R_{ad})$ in Eq.\ref{['eq:UVW_p']}
  • Figure 2: Baseline experiments on multimodal fusion. A. Decision level multimodal fusion, $\mathcal{P}(P_{ath}R_{ad})$ in Eq.\ref{['eq:bayes1']}. B. Radiology-guided feature-level fusion, probability distribution follows $\mathcal{P}(F_{ea}|R_{ad})$. '*' means the model is well-trained and frozen in pipeline training.
  • Figure 3: Data visualization of the dataset. First row shows two modalities image from MSS subjects. The second row shows data from MSI subject.