Table of Contents
Fetching ...

MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network

Doanh C. Bui, Jin Tae Kwak

TL;DR

This study proposes MECFormer, a generative Transformer-based model designed to handle multiple tasks within one model, and introduces an Expert Consultation Network, a projection layer placed at the beginning of the Transformer-based model to enable flexible classification.

Abstract

Whole slide image (WSI) classification is a crucial problem for cancer diagnostics in clinics and hospitals. A WSI, acquired at gigapixel size, is commonly tiled into patches and processed by multiple-instance learning (MIL) models. Previous MIL-based models designed for this problem have only been evaluated on individual tasks for specific organs, and the ability to handle multiple tasks within a single model has not been investigated. In this study, we propose MECFormer, a generative Transformer-based model designed to handle multiple tasks within one model. To leverage the power of learning multiple tasks simultaneously and to enhance the model's effectiveness in focusing on each individual task, we introduce an Expert Consultation Network, a projection layer placed at the beginning of the Transformer-based model. Additionally, to enable flexible classification, autoregressive decoding is incorporated by a language decoder for WSI classification. Through extensive experiments on five datasets involving four different organs, one cancer classification task, and four cancer subtyping tasks, MECFormer demonstrates superior performance compared to individual state-of-the-art multiple-instance learning models.

MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network

TL;DR

This study proposes MECFormer, a generative Transformer-based model designed to handle multiple tasks within one model, and introduces an Expert Consultation Network, a projection layer placed at the beginning of the Transformer-based model to enable flexible classification.

Abstract

Whole slide image (WSI) classification is a crucial problem for cancer diagnostics in clinics and hospitals. A WSI, acquired at gigapixel size, is commonly tiled into patches and processed by multiple-instance learning (MIL) models. Previous MIL-based models designed for this problem have only been evaluated on individual tasks for specific organs, and the ability to handle multiple tasks within a single model has not been investigated. In this study, we propose MECFormer, a generative Transformer-based model designed to handle multiple tasks within one model. To leverage the power of learning multiple tasks simultaneously and to enhance the model's effectiveness in focusing on each individual task, we introduce an Expert Consultation Network, a projection layer placed at the beginning of the Transformer-based model. Additionally, to enable flexible classification, autoregressive decoding is incorporated by a language decoder for WSI classification. Through extensive experiments on five datasets involving four different organs, one cancer classification task, and four cancer subtyping tasks, MECFormer demonstrates superior performance compared to individual state-of-the-art multiple-instance learning models.
Paper Structure (29 sections, 6 equations, 4 figures, 4 tables)

This paper contains 29 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of the multi-task process of MECFormer. All experts observe and process an input WSI as the corresponding expert to the target task makes the most significant contribution. A generative transformer-based model receives information from all experts and produces the predicted diagnostic term.
  • Figure 2: Overview of MECFormer. MECFormer is designed as a Transformer-based generative model in an encoder-decoder manner. ECN layer is placed at the beginning of the encoder and aware of the target task along with the input bag of patch features.
  • Figure 3: Distribution of five datasets: CAMELYON16, TCGA-BRCA, TCGA-ESCA, TCGA-RCC and TCGA-NSCLC.
  • Figure 4: t-SNE visualizations for four scenarios: 1) Raw features extracted by off-the-shelf feature extractor $\mathcal{G}(\cdot)$, 2) $\mathcal{P}_{1}$, 3) $\mathcal{P}_{T}$, and 4) $\mathcal{P}_{\texttt{ECN}}$. $S$ denotes a Silhouette score.