Table of Contents
Fetching ...

Enabling clinical use of foundation models in histopathology

Audun L. Henriksen, Ole-Johan Skrede, Lisa van der Schee, Enric Domingo, Sepp De Raedt, Ilyá Kostolomov, Jennifer Hay, Karolina Cyll, Wanja Kildal, Joakim Kalsnes, Robert W. Williams, Manohar Pradhan, John Arne Nesheim, Hanne A. Askautrud, Maria X. Isaksen, Karmele Saez de Gordoa, Miriam Cuatrecasas, Joanne Edwards, TransSCOT group, Arild Nesbakken, Neil A. Shepherd, Ian Tomlinson, Daniel-Christoph Wagner, Rachel S. Kerr, Tarjei Sveinsgjerd Hveem, Knut Liestøl, Yoshiaki Nakamura, Marco Novelli, Masaaki Miyo, Sebastian Foersch, David N. Church, Miangela M. Lacle, David J. Kerr, Andreas Kleppe

TL;DR

This approach successfully mitigates robustness issues of foundation models for computational pathology without retraining the foundation models themselves, enabling development of robust computational pathology models applicable to real-world data in routine clinical practice.

Abstract

Foundation models in histopathology are expected to facilitate the development of high-performing and generalisable deep learning systems. However, current models capture not only biologically relevant features, but also pre-analytic and scanner-specific variation that bias the predictions of task-specific models trained from the foundation model features. Here we show that introducing novel robustness losses during training of downstream task-specific models reduces sensitivity to technical variability. A purpose-designed comprehensive experimentation setup with 27,042 WSIs from 6155 patients is used to train thousands of models from the features of eight popular foundation models for computational pathology. In addition to a substantial improvement in robustness, we observe that prediction accuracy improves by focusing on biologically relevant features. Our approach successfully mitigates robustness issues of foundation models for computational pathology without retraining the foundation models themselves, enabling development of robust computational pathology models applicable to real-world data in routine clinical practice.

Enabling clinical use of foundation models in histopathology

TL;DR

This approach successfully mitigates robustness issues of foundation models for computational pathology without retraining the foundation models themselves, enabling development of robust computational pathology models applicable to real-world data in routine clinical practice.

Abstract

Foundation models in histopathology are expected to facilitate the development of high-performing and generalisable deep learning systems. However, current models capture not only biologically relevant features, but also pre-analytic and scanner-specific variation that bias the predictions of task-specific models trained from the foundation model features. Here we show that introducing novel robustness losses during training of downstream task-specific models reduces sensitivity to technical variability. A purpose-designed comprehensive experimentation setup with 27,042 WSIs from 6155 patients is used to train thousands of models from the features of eight popular foundation models for computational pathology. In addition to a substantial improvement in robustness, we observe that prediction accuracy improves by focusing on biologically relevant features. Our approach successfully mitigates robustness issues of foundation models for computational pathology without retraining the foundation models themselves, enabling development of robust computational pathology models applicable to real-world data in routine clinical practice.
Paper Structure (49 sections, 10 equations, 11 figures, 4 tables)

This paper contains 49 sections, 10 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Method overview
  • Figure 1: WSI prediction score comparison between tissue sections Like scatterplots in \ref{['fig:scan-comparison']} for the remaining seven foundation models.
  • Figure 2: Foundation model robustness issue
  • Figure 2: Scanner prediction with linear probing A linear classifier was trained on top of raw features from eight different foundation models for computational pathology. The classifier was trained on WSIs from QUASAR 2 and applied on WSIs from TransSCOT. The task was to predict whether a scan had been imaged with one of the following five scanners: Aperio AT2, Aperio GT 450 DX, NanoZoomer XR, KF-PRO-400, P1000.
  • Figure 3: Survival prediction
  • ...and 6 more figures