Table of Contents
Fetching ...

A Survey of Pathology Foundation Model: Progress and Future Directions

Conghao Xiong, Hao Chen, Joseph J. Y. Sung

TL;DR

PFMs pretrained on large-scale histopathology data operate within MIL, where a WSI is represented as a bag $X=\{x_i\}_{i=1}^N$ with bag label $Y$ such that $Y=1$ if $\exists i: y_i=1$, otherwise $Y=0$, with patch features $z_i=\mathcal{M}_e(x_i)$ and bag representation $h=\mathcal{M}_g(Z)$. The paper proposes a hierarchical taxonomy based on Model Scope, Model Pretraining, and Model Design to enable holistic analysis across pathology and beyond and provides a benchmarking framework spanning slide-level, patch-level, multimodal, and biological tasks. It surveys SSL and supervised pretraining, detailing vision-only and inter-modal methods (e.g., CLIP/CoCa) and pathology-tailored adaptations, while documenting a shift toward aggregator pretraining and larger extractor scales. The authors further discuss practical challenges—end-to-end pretraining, data-model scalability, federated learning, robustness—and outline future directions such as RAG-enhanced pathology VLMs and continual learning for maintenance, aiming to bridge methodological gaps and enable robust clinical deployment.

Abstract

Computational pathology, which involves analyzing whole slide images for automated cancer diagnosis, relies on multiple instance learning, where performance depends heavily on the feature extractor and aggregator. Recent Pathology Foundation Models (PFMs), pretrained on large-scale histopathology data, have significantly enhanced both the extractor and aggregator, but they lack a systematic analysis framework. In this survey, we present a hierarchical taxonomy organizing PFMs through a top-down philosophy applicable to foundation model analysis in any domain: model scope, model pretraining, and model design. Additionally, we systematically categorize PFM evaluation tasks into slide-level, patch-level, multimodal, and biological tasks, providing comprehensive benchmarking criteria. Our analysis identifies critical challenges in both PFM development (pathology-specific methodology, end-to-end pretraining, data-model scalability) and utilization (effective adaptation, model maintenance), paving the way for future directions in this promising field. Resources referenced in this survey are available at https://github.com/BearCleverProud/AwesomeWSI.

A Survey of Pathology Foundation Model: Progress and Future Directions

TL;DR

PFMs pretrained on large-scale histopathology data operate within MIL, where a WSI is represented as a bag with bag label such that if , otherwise , with patch features and bag representation . The paper proposes a hierarchical taxonomy based on Model Scope, Model Pretraining, and Model Design to enable holistic analysis across pathology and beyond and provides a benchmarking framework spanning slide-level, patch-level, multimodal, and biological tasks. It surveys SSL and supervised pretraining, detailing vision-only and inter-modal methods (e.g., CLIP/CoCa) and pathology-tailored adaptations, while documenting a shift toward aggregator pretraining and larger extractor scales. The authors further discuss practical challenges—end-to-end pretraining, data-model scalability, federated learning, robustness—and outline future directions such as RAG-enhanced pathology VLMs and continual learning for maintenance, aiming to bridge methodological gaps and enable robust clinical deployment.

Abstract

Computational pathology, which involves analyzing whole slide images for automated cancer diagnosis, relies on multiple instance learning, where performance depends heavily on the feature extractor and aggregator. Recent Pathology Foundation Models (PFMs), pretrained on large-scale histopathology data, have significantly enhanced both the extractor and aggregator, but they lack a systematic analysis framework. In this survey, we present a hierarchical taxonomy organizing PFMs through a top-down philosophy applicable to foundation model analysis in any domain: model scope, model pretraining, and model design. Additionally, we systematically categorize PFM evaluation tasks into slide-level, patch-level, multimodal, and biological tasks, providing comprehensive benchmarking criteria. Our analysis identifies critical challenges in both PFM development (pathology-specific methodology, end-to-end pretraining, data-model scalability) and utilization (effective adaptation, model maintenance), paving the way for future directions in this promising field. Resources referenced in this survey are available at https://github.com/BearCleverProud/AwesomeWSI.

Paper Structure

This paper contains 13 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Schematic representation of our hierarchical taxonomy integrated within the MIL framework for PFMs.