Table of Contents
Fetching ...

Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models

Yunqi Hong, Johnson Kao, Liam Edwards, Nein-Tzu Liu, Chung-Yen Huang, Alex Oliveira-Kowaleski, Cho-Jui Hsieh, Neil Y. C. Lin

TL;DR

RECAP-PATH is presented, an interpretable framework that establishes a self-learning paradigm, shifting off-the-shelf multimodal large language models from passive pattern recognition to evidence-linked diagnostic reasoning, and provides clinically trustworthy AI and demonstrates a generalizable path toward evidence-linked interpretation.

Abstract

AI tools in pathology have improved screening throughput, standardized quantification, and revealed prognostic patterns that inform treatment. However, adoption remains limited because most systems still lack the human-readable reasoning needed to audit decisions and prevent errors. We present RECAP-PATH, an interpretable framework that establishes a self-learning paradigm, shifting off-the-shelf multimodal large language models from passive pattern recognition to evidence-linked diagnostic reasoning. At its core is a two-phase learning process that autonomously derives diagnostic criteria: diversification expands pathology-style explanations, while optimization refines them for accuracy. This self-learning approach requires only small labeled sets and no white-box access or weight updates to generate cancer diagnoses. Evaluated on breast and prostate datasets, RECAP-PATH produced rationales aligned with expert assessment and delivered substantial gains in diagnostic accuracy over baselines. By uniting visual understanding with reasoning, RECAP-PATH provides clinically trustworthy AI and demonstrates a generalizable path toward evidence-linked interpretation.

Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models

TL;DR

RECAP-PATH is presented, an interpretable framework that establishes a self-learning paradigm, shifting off-the-shelf multimodal large language models from passive pattern recognition to evidence-linked diagnostic reasoning, and provides clinically trustworthy AI and demonstrates a generalizable path toward evidence-linked interpretation.

Abstract

AI tools in pathology have improved screening throughput, standardized quantification, and revealed prognostic patterns that inform treatment. However, adoption remains limited because most systems still lack the human-readable reasoning needed to audit decisions and prevent errors. We present RECAP-PATH, an interpretable framework that establishes a self-learning paradigm, shifting off-the-shelf multimodal large language models from passive pattern recognition to evidence-linked diagnostic reasoning. At its core is a two-phase learning process that autonomously derives diagnostic criteria: diversification expands pathology-style explanations, while optimization refines them for accuracy. This self-learning approach requires only small labeled sets and no white-box access or weight updates to generate cancer diagnoses. Evaluated on breast and prostate datasets, RECAP-PATH produced rationales aligned with expert assessment and delivered substantial gains in diagnostic accuracy over baselines. By uniting visual understanding with reasoning, RECAP-PATH provides clinically trustworthy AI and demonstrates a generalizable path toward evidence-linked interpretation.

Paper Structure

This paper contains 22 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Framework of RECAP-PATH. (A) Overview of the RECAP-PATH learning framework and deployment pipeline. Using a small set of labeled pathology images (left), RECAP-PATH conducts a two-phase diagnostic learning process that yields an optimized prompt encapsulating the diagnosis criteria. During inference on unseen pathology images (right), the model generates detailed image descriptions guided by the optimized criteria and then produces classification predictions informed by both visual features and textual descriptions. (B) Schematic of the automatic prompt refinement workflow. In each iteration, RECAP-PATH identifies error cases, prompts the model to reflect on failure modes, and generates revised prompts aimed at enhancing diagnostic accuracy and improving human-readable diagnostic rationales. Through this iterative, error-driven refinement, the framework produces prompts that are both clinically meaningful and performance-optimized. Representative examples of such prompts are shown in Fig. S1.
  • Figure 2: Learning dynamics and prompt evolution in RECAP-PATH. (A) Prediction accuracy over learning iterations. Accuracy decreases slightly during Phase 1 (diversification) as the model explores a broader range of diagnostic reasoning strategies. In Phase 2 (accuracy), accuracy increases rapidly, with substantial improvements achieved within a few iterations and convergence around six rounds. (B) Prompt diversity over learning iterations. Diversity steadily increases during Phase 1 as the goal is more diverse reasoning strategies. During Phase 2, diversity decreases slightly as prompts are refined for accuracy but remains substantially higher than the starting point. (C) Impact of the diversification phase. Incorporating Phase 1 leads to significantly greater lexical and conceptual diversity in the final prompt compared to training without diversification. (D) Evolution of the test confusion matrix. The model progresses from an initial zero-shot bias toward one category to a well-balanced, optimized diagnostic performance. (E) Example of prompt evolution. The initial seed prompt is generic and simple, while the final optimized prompt reflects a structured, clinically meaningful diagnostic framework. (F) UMAP visualization of the description embeddings. Initially, descriptions for the two classes overlap with poor separability. After optimization, descriptions form two well-separated clusters, demonstrating semantic disentanglement aligned with diagnostic categories.
  • Figure 3: Pathologist knowledge augmentation for RECAP-PATH optimization. (A) Integration of expert feedback into the RECAP-PATH framework. Three board-certified pathologists provided blinded evaluations of LLM-generated image descriptions across normal and invasive carcinoma cases, rating their precision and histopathological accuracy. These assessments were incorporated into the refinement process. The assessment summary is shown. (B) Pathologist ratings demonstrated that incorporating expert feedback improved the clinical coherence and histopathological correctness of generated descriptions by nearly 20%. (C) Case-level analysis of optimized outputs. Representative examples of image-specific diagnostic narratives illustrate how the optimized diagnosis criteria guide the MLLM to identify key histopathological features. For benign samples, the model describes hallmark features such as organized tubular architecture, uniform gland size, absence of nucleoli, and abundant cytoplasm. For invasive carcinoma, it highlights malignant patterns including infiltrative growth, nuclear pleomorphism, prominent nucleoli, and stromal invasion.
  • Figure 4: Subtype-specific classification performance and semantic interpretability of RECAP-PATH. (A) Confusion matrices for binary classification of ductal carcinoma in situ (DCIS) versus invasive carcinoma (IC), showing improved performance after prompt optimization (true positive rates: 0.85 for DCIS, 0.90 for IC). (B) Example of an optimized prompt illustrating how the model autonomously generated subtype-specific diagnostic criteria, emphasizing features such as stromal invasion, ductal confinement, and differences between intraductal and infiltrative growth. (C) Representative image and corresponding generated description for DCIS, highlighting hallmark features aligned with established pathological criteria. (D) UMAP visualization of description embeddings, demonstrating clear semantic separation between DCIS and IC, consistent with phenotype-specific explanations. (E) Confusion matrices for multiclass classification (Normal, DCIS, IC), showing progressive performance gains after optimization. (F) UMAP visualization of description embeddings in the multiclass setting, demonstrating improved clustering and subtype differentiation, with most errors involving normal cases misclassified as DCIS due to overlapping ductal features.
  • Figure 5: Generalization of RECAP-PATH across pathology datasets. (A) Confusion matrices for normal versus invasive carcinoma in the BACH dataset, showing improved performance after prompt optimization despite lower resolution and smaller dataset size compared to BRACS. (B) Accuracy trajectory in BACH, demonstrating a similar non-monotonic two-phase learning dynamic as observed in BRACS. (C) Confusion matrices for benign versus malignant classification in the prostate cancer SICAPv2 dataset, showing balanced performance gains after optimization. (D) Reproduction of the two-phase learning dynamics in SICAPv2. (E) Example optimized diagnostic criteria in prostate histology, illustrating key features such as glandular architecture, arrangement, and cytological atypia. (F) Case-level description of a malignant prostate sample (Gleason 4+4), where the optimized prompt guided the model to identify hallmark features including cribriform architecture, infiltrative growth, and nuclear pleomorphism, consistent with expert diagnostic criteria.