Table of Contents
Fetching ...

Whole Slide Concepts: A Supervised Foundation Model For Pathological Images

Till Nicke, Daniela Schacherer, Jan Raphael Schäfer, Natalia Artysh, Antje Prasse, André Homeyer, Andrea Schenk, Henning Höfener, Johannes Lotz

TL;DR

The paper addresses the resource-intensive nature of training foundation models for computational pathology by introducing Whole Slide Concepts (WSC), a supervised, end-to-end multitask framework trained on slide-level labels. WSC learns a joint representation across cancer subtyping, survival risk estimation, and genetic mutation prediction, using open TCGA/CPTAC/PLCO data and attention-based pooling for explainability. It demonstrates state-of-the-art or competitive performance on in-domain tasks with substantially lower compute and energy needs, while also transferring well to out-of-domain tasks and enabling tumor detection on unseen slides. The approach enhances reproducibility and reduces environmental impact, offering a practical pathway toward scalable, high-performing pathology models with accessible data and code.

Abstract

Foundation models (FMs) are transforming computational pathology by offering new ways to analyze histopathology images. However, FMs typically require weeks of training on large databases, making their creation a resource-intensive process. In this paper, we present a training for foundation models from whole slide images using supervised, end-to-end, multitask learning on slide-level labels. Notably, it is the first model to incorporate cancer subtyping, risk estimation, and genetic mutation prediction into one model. The presented model outperforms self-supervised models on seven benchmark tasks while the training only required 5% of the computational resources. The results not only show that supervised training can outperform self-supervision with less data, but also offer a solution to annotation problems, as patient-based labels are widely available through routine clinical processes. Furthermore, an attention module provides a layer of explainability across different tasks and serves as a tumor detector for unseen cancer types. To address the issue of closed-source datasets, the model was fully trained on openly available data. The code and model weights are made available under https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling.

Whole Slide Concepts: A Supervised Foundation Model For Pathological Images

TL;DR

The paper addresses the resource-intensive nature of training foundation models for computational pathology by introducing Whole Slide Concepts (WSC), a supervised, end-to-end multitask framework trained on slide-level labels. WSC learns a joint representation across cancer subtyping, survival risk estimation, and genetic mutation prediction, using open TCGA/CPTAC/PLCO data and attention-based pooling for explainability. It demonstrates state-of-the-art or competitive performance on in-domain tasks with substantially lower compute and energy needs, while also transferring well to out-of-domain tasks and enabling tumor detection on unseen slides. The approach enhances reproducibility and reduces environmental impact, offering a practical pathway toward scalable, high-performing pathology models with accessible data and code.

Abstract

Foundation models (FMs) are transforming computational pathology by offering new ways to analyze histopathology images. However, FMs typically require weeks of training on large databases, making their creation a resource-intensive process. In this paper, we present a training for foundation models from whole slide images using supervised, end-to-end, multitask learning on slide-level labels. Notably, it is the first model to incorporate cancer subtyping, risk estimation, and genetic mutation prediction into one model. The presented model outperforms self-supervised models on seven benchmark tasks while the training only required 5% of the computational resources. The results not only show that supervised training can outperform self-supervision with less data, but also offer a solution to annotation problems, as patient-based labels are widely available through routine clinical processes. Furthermore, an attention module provides a layer of explainability across different tasks and serves as a tumor detector for unseen cancer types. To address the issue of closed-source datasets, the model was fully trained on openly available data. The code and model weights are made available under https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling.

Paper Structure

This paper contains 20 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Average AUC over four runs on four cancer subtyping tasks divided into in- (left) and out-of-domain (right) tasks depending on their similarity to the training distribution.
  • Figure 2: Mean c-index of four distinct runs on three benchmark evaluation tasks to estimate OS on Brain, Lung, and Prostate tissue (left to right).
  • Figure 3: Side-by-side comparison of the tiles with the highest attention values (right) and the expert tumor annotations (left) without fine-tuning of WSC on an unseen slide from the TUPAC16 cohort.
  • Figure 4: Schematic overview of the pipeline. A collection of WSI with corresponding labels (left) is used to train a tile encoder and pooling operation using multi-task learning (middle). The training is done in an end-to-end manner, iterating over individual tasks (lower right) and accumulating the gradient. Learnable weights rate each instance of the bag and then compress it into a latent WSI vector (top right).
  • Figure 5: Attention weight visualization of a model fine-tuned for lung fibrosis estimation.