RudolfV: A Foundation Model by Pathologists for Pathologists

Jonas Dippel; Barbara Feulner; Tobias Winterhoff; Timo Milbich; Stephan Tietz; Simon Schallenberg; Gabriel Dernbach; Andreas Kunft; Simon Heinke; Marie-Lisa Eich; Julika Ribbat-Idel; Rosemarie Krupar; Philipp Anders; Niklas Prenißl; Philipp Jurmeister; David Horst; Lukas Ruff; Klaus-Robert Müller; Frederick Klauschen; Maximilian Alber

RudolfV: A Foundation Model by Pathologists for Pathologists

Jonas Dippel, Barbara Feulner, Tobias Winterhoff, Timo Milbich, Stephan Tietz, Simon Schallenberg, Gabriel Dernbach, Andreas Kunft, Simon Heinke, Marie-Lisa Eich, Julika Ribbat-Idel, Rosemarie Krupar, Philipp Anders, Niklas Prenißl, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen, Maximilian Alber

TL;DR

RudolfV introduces a pathologist-guided, self-supervised foundation model trained on a large, diverse, multi-institutional histopathology dataset, incorporating 58 tissue types and 129 staining modalities. By integrating domain knowledge into data curation, slide grouping, and stain-aware augmentation within a DINOv2 framework, RudolfV achieves state-of-the-art performance across tumor microenvironment characterization, IHC biomarker scoring, and rare-disease reference case search, while exhibiting robustness to stain and scanner variability. The study demonstrates that domain-specific data diversity and expert-guided design can emulate the benefits of orders of magnitude more data, outlining a scalable path toward broad clinical adoption in digital pathology. Limitations include the absence of cytopathology and hematopathology cases, with future work aimed at expanding multimodal integration and larger-scale architectures.

Abstract

Artificial intelligence has started to transform histopathology impacting clinical diagnostics and biomedical research. However, while many computational pathology approaches have been proposed, most current AI models are limited with respect to generalization, application variety, and handling rare diseases. Recent efforts introduced self-supervised foundation models to address these challenges, yet existing approaches do not leverage pathologist knowledge by design. In this study, we present a novel approach to designing foundation models for computational pathology, incorporating pathologist expertise, semi-automated data curation, and a diverse dataset from over 15 laboratories, including 58 tissue types, and encompassing 129 different histochemical and immunohistochemical staining modalities. We demonstrate that our model "RudolfV" surpasses existing state-of-the-art foundation models across different benchmarks focused on tumor microenvironment profiling, biomarker evaluation, and reference case search while exhibiting favorable robustness properties. Our study shows how domain-specific knowledge can increase the efficiency and performance of pathology foundation models and enable novel application areas.

RudolfV: A Foundation Model by Pathologists for Pathologists

TL;DR

Abstract

Paper Structure (33 sections, 6 figures)

This paper contains 33 sections, 6 figures.

Introduction
Results
Pathologist-guided and diversity-focused foundation model design
Pan-indication tumor microenvironment characterization
Pan-indication immunohistochemistry biomarker scoring
Reference case search
Foundation model characteristics and robustness
Histological and molecular prediction benchmarks
Discussion
Methods
Pathologist-guided and diversity-focused foundation model design
Data curation
Data sampling
Data augmentation
Pretraining
...and 18 more sections

Figures (6)

Figure 1: Overview of the approach.(A) Curated data: A dataset of 134k slides comprising 34k cases was assembled with the aim to maximize diversity while keeping size tractable. (B) Combining computational and pathologist expertise: Pathologists and computational scientists collaborated to group similar slides and cluster morphologically similar tissue in order to guide the data balancing in step (C). Based on a sample's lab of origin, tissue type, diseases, and staining modality, all slides were assigned to one of 31 groups following the principle of maximizing homogeneity within groups and heterogeneity across groups and 9 distinct, human-interpretable tissue clusters were formed by aggregating 100 precomputed image clusters. For group and cluster details see also Figure \ref{['fig:data-sampling']}. (C) AI training: Our foundation model RudolfV was trained by adapting the DINOv2 framework to sample training data from a specific distribution derived from slide groups and tissue clusters in order to balance frequent and infrequent diseases and biologies. Additionally, augmentations were extended with stain variations. (D) Applications: The resulting foundation model can be used for various applications in digital pathology.
Figure 2: Pathologist-guided and diversity-focused curation of slide groups and tissue clusters: Pathologists and computational scientists collaborated to group slides and tissue patches. (A) Slides were grouped based on similarity of tissue and disease type, laboratory, and staining modality following the principle of maximizing homogeneity within groups and heterogeneity across groups. We show 9 out of the total 31 groups as examples. The full list is given in the supplement. (B) 1.2 billion image patches were extracted from 134k slides. The patches were clustered into 100 clusters and subsequently merged by pathologists into 9 morphological meaningful clusters. The first image column shows a schematic view of the morphology and other columns example morphologies. (C) The slide groups and tissue clusters were used to balance the data sampling process during training. (D) Example for random data sampling without balancing for comparison.
Figure 3: TME characterization and immunohistochemistry biomarker scoring.(A) Model prediction examples of H&E TME cell classification and tissue segmentation as well as IHC biomarker evaluation. Our proposed model performed best on all benchmarks and all datasets. (B, D, F) Results for pan-indication H&E TME applications. (B) Cell classification with 8 cell types on 5 indications. (D) Cell classification with 8 cell types; trained on NSCLC only and evaluated on 4 different indications. (F) Same as (B), but results aggregated per cell type. (C, E, G) Results for IHC biomarker scoring. Our proposed approach yielded best results on all benchmarks and all datasets. (C) Cell classification with 5 cell types on 5 indications and 3 markers. (E) End-to-end membrane marker scoring for carcinoma cell, immune cells, and other cells on 4 indications and 2 markers. (G) Same as (C), but results aggregated per cell type.
Figure 4: Reference case search.(A) Workflow: The pathologist annotates a region of interest (ROI) that is queried against a database of slides. The most similar slides are returned and shown to the pathologist allowing to consult their diagnoses. (B) Evaluation: Results on a benchmark with 178 rare disease slides measuring if the retrieved results contain a slide with the same diagnosis as the query slide from a database. The database contains over 6,400 slides and the rare diseases have a median occurrence of 3 or or 0.04% in the database. The results show that in 41% and 67% of the queries a slide with the same diagnosis was returned when retrieving a single or 10 most similar slides respectively. For reference, not using a foundation model (ResNet-50 trained on ImageNet) yields respectively 0% and 1.7% correctly retrieved diagnoses. (C) Visual aid: The visualization of the regions with the highest similarity to the region of interest can aid pathologists and shows that the foundation model highlighted relevant morphologies. The examples are colon adenocarcinoma at the top and neuroendocrine stomach tumor at the bottom.
Figure 5: Foundation model characteristics and robustness properties. Foundation models learn concepts from data without human supervision. Learned concepts can be examined via principal components PCAShlens2009PCAhotelling1933analysisdinov2virchow of the embedding space, which we qualitatively analyzed for commonalities and robustness in different settings. (A) Pan-Staining: The same tissue was stained in HE (A2) and IHC (A5). The principal component visualization (A3 and A4) highlights the carcinoma component, which is consistent across stains and approximately overlaps with the ground truth output (A1) of a supervised carcinoma detection model. (B) Pan-Scanner: The same tissue was scanned by 4 different scanners (B2-B5). Despite very different visual appearance of the scans, the carcinoma component (B6-9) is consistent across scanners and approximately overlaps with the ground truth (B1), indicating a high degree of scanner invariance of the learned representation. (C) Detailed view: A detailed view of the carcinoma component, showing that most carcinoma cells are covered by the (self-supervised) learned representation. (D) Additional components: Additional components such as fibroblasts or crypts are identified by the foundation model.
...and 1 more figures

RudolfV: A Foundation Model by Pathologists for Pathologists

TL;DR

Abstract

RudolfV: A Foundation Model by Pathologists for Pathologists

Authors

TL;DR

Abstract

Table of Contents

Figures (6)