Table of Contents
Fetching ...

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

Fatemeh Ghezloo, Mehmet Saygin Seyfioglu, Rustin Soraki, Wisdom O. Ikezogwo, Beibin Li, Tejoram Vivekanandan, Joann G. Elmore, Ranjay Krishna, Linda Shapiro

TL;DR

PathFinder tackles the challenge of diagnosing diseases from gigapixel histopathology WSIs by emulating a pathologist's iterative patch-based reasoning through four coordinating agents (Triage, Navigation, Description, Diagnosis). It integrates a vision-based triage module, a text-conditioned navigation strategy, a concise patch-description generator, and a language-driven diagnostic classifier, achieving 74% accuracy on melanoma diagnosis in the M-Path dataset and surpassing average pathologist performance. The framework provides interpretable evidence via patch descriptions and region importance maps, supporting auditability and clinical trust. This multi-modal, multi-agent approach demonstrates significant advances in accuracy, efficiency, and interpretability for AI-assisted pathology, with potential to accelerate diagnostic workflows in real-world settings.

Abstract

Diagnosing diseases through histopathology whole slide images (WSIs) is fundamental in modern pathology but is challenged by the gigapixel scale and complexity of WSIs. Trained histopathologists overcome this challenge by navigating the WSI, looking for relevant patches, taking notes, and compiling them to produce a final holistic diagnostic. Traditional AI approaches, such as multiple instance learning and transformer-based models, fail short of such a holistic, iterative, multi-scale diagnostic procedure, limiting their adoption in the real-world. We introduce PathFinder, a multi-modal, multi-agent framework that emulates the decision-making process of expert pathologists. PathFinder integrates four AI agents, the Triage Agent, Navigation Agent, Description Agent, and Diagnosis Agent, that collaboratively navigate WSIs, gather evidence, and provide comprehensive diagnoses with natural language explanations. The Triage Agent classifies the WSI as benign or risky; if risky, the Navigation and Description Agents iteratively focus on significant regions, generating importance maps and descriptive insights of sampled patches. Finally, the Diagnosis Agent synthesizes the findings to determine the patient's diagnostic classification. Our Experiments show that PathFinder outperforms state-of-the-art methods in skin melanoma diagnosis by 8% while offering inherent explainability through natural language descriptions of diagnostically relevant patches. Qualitative analysis by pathologists shows that the Description Agent's outputs are of high quality and comparable to GPT-4o. PathFinder is also the first AI-based system to surpass the average performance of pathologists in this challenging melanoma classification task by 9%, setting a new record for efficient, accurate, and interpretable AI-assisted diagnostics in pathology. Data, code and models available at https://pathfinder-dx.github.io/

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

TL;DR

PathFinder tackles the challenge of diagnosing diseases from gigapixel histopathology WSIs by emulating a pathologist's iterative patch-based reasoning through four coordinating agents (Triage, Navigation, Description, Diagnosis). It integrates a vision-based triage module, a text-conditioned navigation strategy, a concise patch-description generator, and a language-driven diagnostic classifier, achieving 74% accuracy on melanoma diagnosis in the M-Path dataset and surpassing average pathologist performance. The framework provides interpretable evidence via patch descriptions and region importance maps, supporting auditability and clinical trust. This multi-modal, multi-agent approach demonstrates significant advances in accuracy, efficiency, and interpretability for AI-assisted pathology, with potential to accelerate diagnostic workflows in real-world settings.

Abstract

Diagnosing diseases through histopathology whole slide images (WSIs) is fundamental in modern pathology but is challenged by the gigapixel scale and complexity of WSIs. Trained histopathologists overcome this challenge by navigating the WSI, looking for relevant patches, taking notes, and compiling them to produce a final holistic diagnostic. Traditional AI approaches, such as multiple instance learning and transformer-based models, fail short of such a holistic, iterative, multi-scale diagnostic procedure, limiting their adoption in the real-world. We introduce PathFinder, a multi-modal, multi-agent framework that emulates the decision-making process of expert pathologists. PathFinder integrates four AI agents, the Triage Agent, Navigation Agent, Description Agent, and Diagnosis Agent, that collaboratively navigate WSIs, gather evidence, and provide comprehensive diagnoses with natural language explanations. The Triage Agent classifies the WSI as benign or risky; if risky, the Navigation and Description Agents iteratively focus on significant regions, generating importance maps and descriptive insights of sampled patches. Finally, the Diagnosis Agent synthesizes the findings to determine the patient's diagnostic classification. Our Experiments show that PathFinder outperforms state-of-the-art methods in skin melanoma diagnosis by 8% while offering inherent explainability through natural language descriptions of diagnostically relevant patches. Qualitative analysis by pathologists shows that the Description Agent's outputs are of high quality and comparable to GPT-4o. PathFinder is also the first AI-based system to surpass the average performance of pathologists in this challenging melanoma classification task by 9%, setting a new record for efficient, accurate, and interpretable AI-assisted diagnostics in pathology. Data, code and models available at https://pathfinder-dx.github.io/

Paper Structure

This paper contains 21 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The left panel illustrates the Navigation Agent, as outlined in Section \ref{['nav-agent']}. The right panel presents the iterative trajectory generation process, which employs both the Navigation Agent and Description Agent, as described in Section \ref{['trajectories']}.
  • Figure 1: Overview of the Triage Agent architecture. Definitions of $M$ and $H$ can be found in Section \ref{['sec:methods']}.
  • Figure 2: Ablation results. We ran 10 experiments, and plotted both the mean and standard deviation.
  • Figure 2: GPT-4 prompt to generate instruction-tuning dataset for the Description Agent.
  • Figure 3: Expert human pathologist preferences for each model in assessing description quality, evaluated in a double-blind survey for unbiased comparison.
  • ...and 1 more figures