Table of Contents
Fetching ...

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

Tianyu Liu, Weihao Xuan, Hao Wu, Peter Humphrey, Marcello DiStasio, Heli Qi, Rui Yang, Simeng Han, Tinglin Huang, Fang Wu, Nan Liu, Irene Li, Hua Xu, Hongyu Zhao

TL;DR

TeamPath addresses the need for rigorously reasoning pathology AI by coupling a pathology-focused visual-language model with an LLM-driven router and reinforcement learning. The method optimizes $J_{GRPO}( heta)$ using group-relative advantage $\hat{A}_{i,t}$ within a multi-task framework that includes pathology VQA, ROI captioning, and cross-modality transcriptomic generation, validated against expert pathologists. It demonstrates state-of-the-art performance on PathMMU, high-quality reasoning paths, and robust correction of expert outputs, while enabling cross-modal molecular generation and ROI summarization. This work provides a practical AI copilot for clinicians, enabling reliable, interpretable, and integrative analyses in histopathology, with open-source code and clearly defined evaluation pipelines.

Abstract

Advances in AI have introduced several strong models in computational pathology to usher it into the era of multi-modal diagnosis, analysis, and interpretation. However, the current pathology-specific visual language models still lack capacities in making diagnosis with rigorous reasoning paths as well as handling divergent tasks, and thus challenges of building AI Copilots for real scenarios still exist. Here we introduce TeamPath, an AI system powered by reinforcement learning and router-enhanced solutions based on large-scale histopathology multimodal datasets, to work as a virtual assistant for expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic information for the clinical usage. We also collaborate with pathologists from Yale School of Medicine to demonstrate that TeamPath can assist them in working more efficiently by identifying and correcting expert conclusions and reasoning paths. Overall, TeamPath can flexibly choose the best settings according to the needs, and serve as an innovative and reliable system for information communication across different modalities and experts.

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

TL;DR

TeamPath addresses the need for rigorously reasoning pathology AI by coupling a pathology-focused visual-language model with an LLM-driven router and reinforcement learning. The method optimizes using group-relative advantage within a multi-task framework that includes pathology VQA, ROI captioning, and cross-modality transcriptomic generation, validated against expert pathologists. It demonstrates state-of-the-art performance on PathMMU, high-quality reasoning paths, and robust correction of expert outputs, while enabling cross-modal molecular generation and ROI summarization. This work provides a practical AI copilot for clinicians, enabling reliable, interpretable, and integrative analyses in histopathology, with open-source code and clearly defined evaluation pipelines.

Abstract

Advances in AI have introduced several strong models in computational pathology to usher it into the era of multi-modal diagnosis, analysis, and interpretation. However, the current pathology-specific visual language models still lack capacities in making diagnosis with rigorous reasoning paths as well as handling divergent tasks, and thus challenges of building AI Copilots for real scenarios still exist. Here we introduce TeamPath, an AI system powered by reinforcement learning and router-enhanced solutions based on large-scale histopathology multimodal datasets, to work as a virtual assistant for expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic information for the clinical usage. We also collaborate with pathologists from Yale School of Medicine to demonstrate that TeamPath can assist them in working more efficiently by identifying and correcting expert conclusions and reasoning paths. Overall, TeamPath can flexibly choose the best settings according to the needs, and serve as an innovative and reliable system for information communication across different modalities and experts.

Paper Structure

This paper contains 8 sections, 6 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Landscape of TeamPath (a) Steps of dataset curation. We extract image-text pairs from a processed TCGA dataset (PathGen-1.6M). (b) Word cloud visualization of ROI captions (upper) and questions (bottom). (c) The core visual language model architecture of TeamPath. (d) TeamPath as a system with an LLM-enhanced router (with over 80% accuracy in choosing the correct approach) and the corresponding capacities in various downstream applications. The logo fire means that we need to adjust the parameters of models, and the logo snowflake means that we do not change the parameters. (e) Overall ranking list of different methods across tasks and metrics. A lower rank (larger bubble) means a better method.
  • Figure 2: Benchmarking results with PathMMU for the pathology VQA task. We note that since we did not have information about the testing setting of PathGene-LLaVA-13B, we used results reported by the model creators in sunpathgen. (a) Accuracy across different categories of all selected methods with all samples. (b) Accuracy across different categories of all selected methods with samples from a tiny set. (c) Accuracy across different categories of all selected methods with samples from a large set. (d) Joint visualization with accuracy and ranking information for all selected methods. The darker the bubble color, the higher the model score; The larger the bubble shape, the lower the model ranking.
  • Figure 3:
  • Figure 4: Results of using TeamPath as the answer corrector/reason corrector. TeamPath can work with pathologists together to improve the diagnosis accuracy and provide explainable reasons to support the decision. (a) The illustration of self-verification/correction steps for both answers and reasoning paths. (b) Accuracy before and after correction based on selected samples from PathMMU. We report the average scores and standard deviation across three experts. The test is a one-sided Wilcoxon Rank-sum test. (c) A case study to demonstrate the power of TeamPath as an AI assistant.
  • Figure 5: Benchmarking results of the caption summary task. (a) Performances of different methods for summarizing the caption based on ROI-level information across all metrics. We report the average scores and scaled standard deviation (0.1*sd) with all samples in the testing set. (b) Joint visualization with metric scores and ranking information for all selected methods. The darker the bubble color, the higher the model score; The larger the bubble shape, the lower the model ranking. (c) ROUGE-L and BERT scores based on samples from the selected disease across all methods. (d) ROUGE-L and BERT scores based on samples from the selected tissue across all methods. (e) A case study of caption summary generation based on TeamPath.
  • ...and 1 more figures