Table of Contents
Fetching ...

Tissue Concepts: supervised foundation models in computational pathology

Till Nicke, Jan Raphael Schaefer, Henning Hoefener, Friedrich Feuerhake, Dorit Merhof, Fabian Kiessling, Johannes Lotz

TL;DR

This work presents Tissue Concepts, a supervised foundation-model-style encoder trained via multi-task learning on 16 pathology-related tasks to produce a robust tissue-concept representation with approximately $912{,}000$ patches. The encoder is evaluated through a MIL-based WSI classification pipeline across four cancers (breast, colon, lung, prostate) and multiple centers, showing performance comparable to self-supervised baselines while using only a fraction of the data and resources. Across breast, prostate, colorectal, and lung datasets, Tissue Concepts demonstrates strong cross-center generalization and outperforms ImageNet baselines, underscoring the value of domain-specific supervised pre-training for computational pathology. The study highlights data- and energy-efficiency benefits, while identifying ongoing challenges in cross-center transfer and the potential for organ-specific fine-tuning to further improve performance and generalizability.

Abstract

Due to the increasing workload of pathologists, the need for automation to support diagnostic tasks and quantitative biomarker evaluation is becoming more and more apparent. Foundation models have the potential to improve generalizability within and across centers and serve as starting points for data efficient development of specialized yet robust AI models. However, the training foundation models themselves is usually very expensive in terms of data, computation, and time. This paper proposes a supervised training method that drastically reduces these expenses. The proposed method is based on multi-task learning to train a joint encoder, by combining 16 different classification, segmentation, and detection tasks on a total of 912,000 patches. Since the encoder is capable of capturing the properties of the samples, we term it the Tissue Concepts encoder. To evaluate the performance and generalizability of the Tissue Concepts encoder across centers, classification of whole slide images from four of the most prevalent solid cancers - breast, colon, lung, and prostate - was used. The experiments show that the Tissue Concepts model achieve comparable performance to models trained with self-supervision, while requiring only 6% of the amount of training patches. Furthermore, the Tissue Concepts encoder outperforms an ImageNet pre-trained encoder on both in-domain and out-of-domain data.

Tissue Concepts: supervised foundation models in computational pathology

TL;DR

This work presents Tissue Concepts, a supervised foundation-model-style encoder trained via multi-task learning on 16 pathology-related tasks to produce a robust tissue-concept representation with approximately patches. The encoder is evaluated through a MIL-based WSI classification pipeline across four cancers (breast, colon, lung, prostate) and multiple centers, showing performance comparable to self-supervised baselines while using only a fraction of the data and resources. Across breast, prostate, colorectal, and lung datasets, Tissue Concepts demonstrates strong cross-center generalization and outperforms ImageNet baselines, underscoring the value of domain-specific supervised pre-training for computational pathology. The study highlights data- and energy-efficiency benefits, while identifying ongoing challenges in cross-center transfer and the potential for organ-specific fine-tuning to further improve performance and generalizability.

Abstract

Due to the increasing workload of pathologists, the need for automation to support diagnostic tasks and quantitative biomarker evaluation is becoming more and more apparent. Foundation models have the potential to improve generalizability within and across centers and serve as starting points for data efficient development of specialized yet robust AI models. However, the training foundation models themselves is usually very expensive in terms of data, computation, and time. This paper proposes a supervised training method that drastically reduces these expenses. The proposed method is based on multi-task learning to train a joint encoder, by combining 16 different classification, segmentation, and detection tasks on a total of 912,000 patches. Since the encoder is capable of capturing the properties of the samples, we term it the Tissue Concepts encoder. To evaluate the performance and generalizability of the Tissue Concepts encoder across centers, classification of whole slide images from four of the most prevalent solid cancers - breast, colon, lung, and prostate - was used. The experiments show that the Tissue Concepts model achieve comparable performance to models trained with self-supervision, while requiring only 6% of the amount of training patches. Furthermore, the Tissue Concepts encoder outperforms an ImageNet pre-trained encoder on both in-domain and out-of-domain data.
Paper Structure (24 sections, 1 equation, 3 figures, 9 tables)

This paper contains 24 sections, 1 equation, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Overview of the study. a) Different pre-training of Tissue Concepts using multi-task learning on 16 different tasks. b) the shared encoder is evaluated using multiple-instance learning on WSI classification. From each WSI, patches of size $224 \times 224$ are extracted in an iterative windowing fashion and the latent representation is positioned at the same spatial location as the patches. A simple CNN is trained on the latent WSIs to learn the label at the slice level.
  • Figure 2: Sample efficiency for encoders of different specificity on the BACH patch classification dataset. Each boxplot represents 10 repetitions. The TC-Swin encoder is compared to the tiny swin transformer ImageNet weights and a small swin transformer of UMedPT schafer_overcoming_2023. The F1-scores are plotted against increasing numbers of images per class to examine sample efficiency.
  • Figure 3: Organ-specific model performance when applied to downstream data. Models consist of organ-specific head and frozen encoders: ImageNet, CTP, TC-Swin, TC-Conv. While breast, colorectal, and prostate tissue was included in the pre-training, no lung tissue has been used during encoder pre-training of TC. "Breast" shows the results on three different classification tasks based on the BRACS dataset. "Colon" shows the cross-center performance between sites of the SemiCol challenge when trained on one center and evaluated on the others. "Lung" shows a 5-fold cross-validation and a cross-center evaluation. "Prostate" shows the performance on two cross-center evaluations as well as a 5-fold cross-validation from the Panda-challenge dataset. Boxes cover the inter-quartile-range, the median is marked by a horizontal line.