Table of Contents
Fetching ...

Fusion of Multi-scale Heterogeneous Pathology Foundation Models for Whole Slide Image Analysis

Zhidong Yang, Xiuhui Shi, Wei Ba, Zhigang Song, Haijing Luan, Taiyuan Hu, Senlin Lin, Jiguang Wang, Shaohua Kevin Zhou, Rui Yan

TL;DR

This paper tackles the challenge of heterogeneous pathology foundation models for whole slide image analysis by proposing FuseCPath, a fusion framework that combines patch-level and slide-level FMs across multiple scales. The method uses multi-view spectral clustering to select representative patches, a cluster-level re-embedding transformer to fuse patch-level features, AB-MIL for aggregation, and slide-level collaborative distillation to align and transfer knowledge from multiple slide-level FMs. Empirical results on TCGA datasets demonstrate state-of-the-art performance in biomarker prediction, gene expression prediction, and survival analysis, with ablations highlighting the value of patch selection, multi-view fusion, and distillation. The approach advances practical WSI analysis by effectively leveraging diverse FMs, with potential extensions to multi-omics and spatial transcriptomics to further enhance molecular-level pathology insights.

Abstract

Whole slide image (WSI) analysis has emerged as an increasingly essential technique in computational pathology. Recent advances in the pathology foundation models (FMs) have demonstrated significant advantages in deriving meaningful patch-level or slide-level multi-scale features from WSIs. However, current pathology FMs have exhibited substantial heterogeneity caused by diverse private training datasets and different network architectures. This heterogeneity introduces performance variability when we utilize the features from different FMs in the downstream tasks. To fully explore the advantages of multiple FMs effectively, in this work, we propose a novel framework for the fusion of multi-scale heterogeneous pathology FMs, called FuseCPath, yielding a model with a superior ensemble performance. The main contributions of our framework can be summarized as follows: (i) To guarantee the representativeness of the training patches, we propose a multi-view clustering-based method to filter out the discriminative patches via multiple FMs' embeddings. (ii) To effectively fuse the patch-level FMs, we devise a cluster-level re-embedding strategy to online capture patch-level local features. (iii) To effectively fuse the slide-level FMs, we devise a collaborative distillation strategy to explore the connections between slide-level FMs. Extensive experiments demonstrate that the proposed FuseCPath achieves state-of-the-art performance across multiple tasks on diverse datasets.

Fusion of Multi-scale Heterogeneous Pathology Foundation Models for Whole Slide Image Analysis

TL;DR

This paper tackles the challenge of heterogeneous pathology foundation models for whole slide image analysis by proposing FuseCPath, a fusion framework that combines patch-level and slide-level FMs across multiple scales. The method uses multi-view spectral clustering to select representative patches, a cluster-level re-embedding transformer to fuse patch-level features, AB-MIL for aggregation, and slide-level collaborative distillation to align and transfer knowledge from multiple slide-level FMs. Empirical results on TCGA datasets demonstrate state-of-the-art performance in biomarker prediction, gene expression prediction, and survival analysis, with ablations highlighting the value of patch selection, multi-view fusion, and distillation. The approach advances practical WSI analysis by effectively leveraging diverse FMs, with potential extensions to multi-omics and spatial transcriptomics to further enhance molecular-level pathology insights.

Abstract

Whole slide image (WSI) analysis has emerged as an increasingly essential technique in computational pathology. Recent advances in the pathology foundation models (FMs) have demonstrated significant advantages in deriving meaningful patch-level or slide-level multi-scale features from WSIs. However, current pathology FMs have exhibited substantial heterogeneity caused by diverse private training datasets and different network architectures. This heterogeneity introduces performance variability when we utilize the features from different FMs in the downstream tasks. To fully explore the advantages of multiple FMs effectively, in this work, we propose a novel framework for the fusion of multi-scale heterogeneous pathology FMs, called FuseCPath, yielding a model with a superior ensemble performance. The main contributions of our framework can be summarized as follows: (i) To guarantee the representativeness of the training patches, we propose a multi-view clustering-based method to filter out the discriminative patches via multiple FMs' embeddings. (ii) To effectively fuse the patch-level FMs, we devise a cluster-level re-embedding strategy to online capture patch-level local features. (iii) To effectively fuse the slide-level FMs, we devise a collaborative distillation strategy to explore the connections between slide-level FMs. Extensive experiments demonstrate that the proposed FuseCPath achieves state-of-the-art performance across multiple tasks on diverse datasets.

Paper Structure

This paper contains 19 sections, 17 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: (a) Conventional foundation model-based WSI analysis paradigm. To achieve optimal performance on downstream tasks, the most straightforward strategy is to select a patch-level or slide-level foundation model that exhibits the strongest performance on the target task. (b) Proposed heterogeneous foundation model fusion-based WSI analysis paradigm. Based on the concept of ensemble learning, a framework for the fusion of multi-scale (patch-level and slide-level) heterogeneous pathology FMs will yield a model with superior performance. (c) The key ideas and contributions of this article.
  • Figure 2: The overall architecture and main components of the proposed FuseCPath framework. (a) The overall architecture of the FuseCPath framework. The FuseCPath can be divided into two essential branches, which are patch-level features re-embedding and slide-level features collaborative distillation. (b) The demonstration of patch-level features re-embedding. Representative features can be summarized with multi-view clustering and online re-embedding. (c) The re-embedded features aggregation module is implemented by AB-MIL.
  • Figure 3: Details of multi-view spectral clustering (MVSC). The patch embeddings from diverse patch-level FMs can be regarded as a view of the original WSI dataset.
  • Figure 4: Examples of WSIs in TCGA-COAD dataset for mutation and wild type. Several patches are selected for visualization.
  • Figure 5: Visualized results of the gene expression predictions and the ground truth values observed by bulk RNA sequencing. If the values are closer to the ground truth, the prediction results are better.
  • ...and 4 more figures