RudolfV: A Foundation Model by Pathologists for Pathologists
Jonas Dippel, Barbara Feulner, Tobias Winterhoff, Timo Milbich, Stephan Tietz, Simon Schallenberg, Gabriel Dernbach, Andreas Kunft, Simon Heinke, Marie-Lisa Eich, Julika Ribbat-Idel, Rosemarie Krupar, Philipp Anders, Niklas Prenißl, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen, Maximilian Alber
TL;DR
RudolfV introduces a pathologist-guided, self-supervised foundation model trained on a large, diverse, multi-institutional histopathology dataset, incorporating 58 tissue types and 129 staining modalities. By integrating domain knowledge into data curation, slide grouping, and stain-aware augmentation within a DINOv2 framework, RudolfV achieves state-of-the-art performance across tumor microenvironment characterization, IHC biomarker scoring, and rare-disease reference case search, while exhibiting robustness to stain and scanner variability. The study demonstrates that domain-specific data diversity and expert-guided design can emulate the benefits of orders of magnitude more data, outlining a scalable path toward broad clinical adoption in digital pathology. Limitations include the absence of cytopathology and hematopathology cases, with future work aimed at expanding multimodal integration and larger-scale architectures.
Abstract
Artificial intelligence has started to transform histopathology impacting clinical diagnostics and biomedical research. However, while many computational pathology approaches have been proposed, most current AI models are limited with respect to generalization, application variety, and handling rare diseases. Recent efforts introduced self-supervised foundation models to address these challenges, yet existing approaches do not leverage pathologist knowledge by design. In this study, we present a novel approach to designing foundation models for computational pathology, incorporating pathologist expertise, semi-automated data curation, and a diverse dataset from over 15 laboratories, including 58 tissue types, and encompassing 129 different histochemical and immunohistochemical staining modalities. We demonstrate that our model "RudolfV" surpasses existing state-of-the-art foundation models across different benchmarks focused on tumor microenvironment profiling, biomarker evaluation, and reference case search while exhibiting favorable robustness properties. Our study shows how domain-specific knowledge can increase the efficiency and performance of pathology foundation models and enable novel application areas.
