Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
Hamideh Kerdegari, Kyle Higgins, Dennis Veselkov, Ivan Laponogov, Inese Polaka, Miguel Coimbra, Junior Andrea Pescino, Marcis Leja, Mario Dinis-Ribeiro, Tania Fleitas Kanonnikoff, Kirill Veselkov
TL;DR
The paper surveys foundation models (FM) as a unifying framework for pathology and endoscopy imaging in gastric inflammation and cancer, outlining architectures, training objectives, and data strategies. It categorizes FM into visually prompted pathology models (segmentation/classification), textually prompted pathology models (classification), and endoscopy-focused visually prompted models, with notable examples such as SAM variants, HIPT, CTransPath, Endo-FM, and Surgical-DINO. The authors discuss challenges—hallucinations, biases, privacy, and computation—and advocate FUTURE-AI-guidelines and model-card style documentation to foster trustworthy clinical deployment. Overall, the work provides a roadmap for integrating FM into multimodal GI diagnostics to improve early detection, risk stratification, and real-time decision support, while highlighting the need for domain-specific benchmarks and governance frameworks.
Abstract
The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FM), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FM in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in developing their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FM into clinical practice for prevention/management of GC cases, thereby improving patient outcomes.
