Table of Contents
Fetching ...

A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models

Dibaloke Chanda, Milan Aryal, Nasim Yahya Soltani, Masoud Ganji

TL;DR

A holistic and systematic overview of recent innovations in FMs and VLMs in CPath is presented and the tools, datasets and training schemes for these models are summarized in addition to categorizing them into distinct groups.

Abstract

Recent advances in deep learning have completely transformed the domain of computational pathology (CPath). More specifically, it has altered the diagnostic workflow of pathologists by integrating foundation models (FMs) and vision-language models (VLMs) in their assessment and decision-making process. The limitations of existing deep learning approaches in CPath can be overcome by FMs through learning a representation space that can be adapted to a wide variety of downstream tasks without explicit supervision. Deploying VLMs allow pathology reports written in natural language be used as rich semantic information sources to improve existing models as well as generate predictions in natural language form. In this survey, a holistic and systematic overview of recent innovations in FMs and VLMs in CPath is presented. Furthermore, the tools, datasets and training schemes for these models are summarized in addition to categorizing them into distinct groups. This extensive survey highlights the current trends in CPath and its possible revolution through the use of FMs and VLMs in the future.

A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models

TL;DR

A holistic and systematic overview of recent innovations in FMs and VLMs in CPath is presented and the tools, datasets and training schemes for these models are summarized in addition to categorizing them into distinct groups.

Abstract

Recent advances in deep learning have completely transformed the domain of computational pathology (CPath). More specifically, it has altered the diagnostic workflow of pathologists by integrating foundation models (FMs) and vision-language models (VLMs) in their assessment and decision-making process. The limitations of existing deep learning approaches in CPath can be overcome by FMs through learning a representation space that can be adapted to a wide variety of downstream tasks without explicit supervision. Deploying VLMs allow pathology reports written in natural language be used as rich semantic information sources to improve existing models as well as generate predictions in natural language form. In this survey, a holistic and systematic overview of recent innovations in FMs and VLMs in CPath is presented. Furthermore, the tools, datasets and training schemes for these models are summarized in addition to categorizing them into distinct groups. This extensive survey highlights the current trends in CPath and its possible revolution through the use of FMs and VLMs in the future.
Paper Structure (25 sections, 18 figures, 9 tables)

This paper contains 25 sections, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Number of publications for FMs and VLMs in pathology (from Google Scholar). The search keywords include "vision-language" + "pathology" for VLMs statistics and "foundation models"+ "pathology" for FMs statistics.
  • Figure 2: Outline of major challenges in CPath. Several causes and consequences for each challenge are outlined in addition to how FMs and VLMs address these challenges.
  • Figure 3: Visualization of the timeline of recently published work in CPath utilizing FMs and VLMs as well as multi-modal datasets. To maintain transparency, we clearly annotate research articles that have been peer-reviewed and articles that are available as pre-prints. Furthermore, high-impact pioneering research works published in prominent journals are highlighted. For pre-prints if there are multiple versions, the latest version and the corresponding date are used.
  • Figure 4: Different components of multi-modal datasets in computational pathology: Type of datasets, sources of data, annotation and pre-processing
  • Figure 5: Comparison between the size of different multi-modal datasets. The size of the bubbles indicates the size of the data set (For visual clarity, the scale used for bubble size is the same within a specific group, but differs between groups).
  • ...and 13 more figures