PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

Yuxuan Sun; Chenglu Zhu; Sunyi Zheng; Kai Zhang; Lin Sun; Zhongyi Shui; Yunlong Zhang; Honglin Li; Lin Yang

PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

Yuxuan Sun, Chenglu Zhu, Sunyi Zheng, Kai Zhang, Lin Sun, Zhongyi Shui, Yunlong Zhang, Honglin Li, Lin Yang

TL;DR

PathAsst tackles the scarcity of high-quality pathology data for multimodal foundation models by constructing two specialized datasets, PathCap and PathInstruct. It couples a pathology-tuned CLIP (PathCLIP) with Vicuna-13b, and uses a two-phase instruction-tuning regime to create PathAsst, a multimodal assistant capable of invoking eight pathology-specific sub-models and a paper-retrieval system for up-to-date context. Empirical results show PathCLIP outperforms general R and existing pathology baselines in image-text retrieval and zero-shot classification, while PathAsst achieves superior performance on pathology VQA tasks compared to prior MLLMs, especially when augmented with PathCLIP and model invocations. The integration of tool-enabled inference and literature retrieval demonstrates practical potential for improving diagnostic accuracy and supporting pathology workflows in real-world settings.

Abstract

As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes.

PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 7 figures, 3 tables)

This paper contains 20 sections, 1 equation, 7 figures, 3 tables.

Introduction
Related Work
Large Language Model (LLM).
Multimodal Large Language Model (MLLM).
Multimodal Model for Pathology.
Multimodal Datasets.
Pathology Dataset Construction
PathAsst Framework Construction
Model Design and Training
Training of PathCLIP.
Training of PathAsst.
Tool Augmented MLLM Inference
Pathology-specific CV Model Zoo.
Enhancing Responses through Paper Retrieval.
Experiments
...and 5 more sections

Figures (7)

Figure 1: Illustration of data processing: pathology image selection, sub-figure & caption separation, and refinement.
Figure 2: Examples of pathology-specific model-invoking instruction-following samples.
Figure 3: An illustration of the overall framework of PathAsst. The multimodal MLLM training encompasses the training processes of both PathCLIP and PathAsst, as well as the construction of a paper embedding database. The tool-augmented MLLM inference details the process of PathAsst utilizing various tools to enhance the quality of its generated outputs.
Figure 4: Comparative assessment of image retrieval performance between CLIP models across collected datasets.
Figure 5: Example of PathAsst calls generation model.
...and 2 more figures

PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

TL;DR

Abstract

PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

Authors

TL;DR

Abstract

Table of Contents

Figures (7)