Table of Contents
Fetching ...

Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation

Hamideh Kerdegari, Kyle Higgins, Dennis Veselkov, Ivan Laponogov, Inese Polaka, Miguel Coimbra, Junior Andrea Pescino, Marcis Leja, Mario Dinis-Ribeiro, Tania Fleitas Kanonnikoff, Kirill Veselkov

TL;DR

The paper surveys foundation models (FM) as a unifying framework for pathology and endoscopy imaging in gastric inflammation and cancer, outlining architectures, training objectives, and data strategies. It categorizes FM into visually prompted pathology models (segmentation/classification), textually prompted pathology models (classification), and endoscopy-focused visually prompted models, with notable examples such as SAM variants, HIPT, CTransPath, Endo-FM, and Surgical-DINO. The authors discuss challenges—hallucinations, biases, privacy, and computation—and advocate FUTURE-AI-guidelines and model-card style documentation to foster trustworthy clinical deployment. Overall, the work provides a roadmap for integrating FM into multimodal GI diagnostics to improve early detection, risk stratification, and real-time decision support, while highlighting the need for domain-specific benchmarks and governance frameworks.

Abstract

The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FM), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FM in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in developing their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FM into clinical practice for prevention/management of GC cases, thereby improving patient outcomes.

Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation

TL;DR

The paper surveys foundation models (FM) as a unifying framework for pathology and endoscopy imaging in gastric inflammation and cancer, outlining architectures, training objectives, and data strategies. It categorizes FM into visually prompted pathology models (segmentation/classification), textually prompted pathology models (classification), and endoscopy-focused visually prompted models, with notable examples such as SAM variants, HIPT, CTransPath, Endo-FM, and Surgical-DINO. The authors discuss challenges—hallucinations, biases, privacy, and computation—and advocate FUTURE-AI-guidelines and model-card style documentation to foster trustworthy clinical deployment. Overall, the work provides a roadmap for integrating FM into multimodal GI diagnostics to improve early detection, risk stratification, and real-time decision support, while highlighting the need for domain-specific benchmarks and governance frameworks.

Abstract

The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FM), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FM in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in developing their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FM into clinical practice for prevention/management of GC cases, thereby improving patient outcomes.
Paper Structure (20 sections, 2 equations, 3 figures, 1 table)

This paper contains 20 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure S1: A. The Correa’s cascade of intestinal type Gastric Carcinogenesis: a sequence of gastric changes from chronic gastritis to atrophic gastritis, then to intestinal metaplasia and dysplasia, culminating in gastric cancer, highlighting a progressive, stepwise development toward malignancy. B. Surveillance guidelines overview pimentel2019management: 1) Detection and Diagnosis: Endoscopy provides a direct view of the stomach lining, enabling the identification of areas that may exhibit precancerous alterations. During this examination, targeted biopsies are collected from visually abnormal or suspicious regions. 2) Pathological Analysis: These biopsies. are meticulously analyzed by pathologists to categorize the cellular composition of the tissue. This examination distinguishes between normal cells, atrophic gastritis, intestinal metaplasia, dysplasia, or the early stages of gastric cancer. The results are used for confirming the diagnosis and assessing the condition's severity. 3) Guiding Management: Insights derived from the endoscopic findings and pathological reports are integral to formulating a management strategy. Decisions regarding the frequency of surveillance, the need for further medical interventions, and evaluations of the risk for progression to gastric cancer are based on these combined observations and individual risk factors such as genetic predispositions.
  • Figure S2: A) An overview of our taxonomy for pathology and endoscopy FM. They are categorized based on the prompt types (i.e visually or textually prompted models) and their utilization. B) Overview of four different common architecture styles used in vision language models: 1) Dual-Encoder designs use a parallel image and text encoder with aligned representations, 2) Fusion designs jointly process both image and text representations via a decoder, 3) Encoder-Decoder designs apply joint feature encoding and decoding sequentially, 4) Adapted large language model (LLM) designs input visual and text prompts to the LLMs to leverage their superior generalization ability. C) Overview of segment anything model (SAM) for pathology image segmentation. D) The process of training textually prompted models with paired image–text dataset via contrastive learning.
  • Figure S3: A. The FUTURE-AI guidelines. B. Proposed Model Cards Framework.