Table of Contents
Fetching ...

PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Xiaomin Wu, Rui Xu, Pengchen Wei, Wenkang Qin, Peixiang Huang, Ziheng Li, Lin Luo

TL;DR

Histopathology AI faces data scarcity and high costs for training multimodal models. The authors build PathEnhanceDS (~45,000 cases across 6 tasks) and apply PathoSync Tuning to LLaVA, Qwen-VL, and InternLM using full-parameter or LoRA updates to align models with pathology tasks. Fine-tuning yields notable improvements in classification accuracy (e.g., PCAM up to 0.96) and captioning/QA quality, with qualitative pathologist feedback supporting honesty and usefulness. By releasing the dataset and tuned models, the work aims to accelerate practical, educational, and research-oriented AI-assisted diagnosis in histopathology.

Abstract

Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant divide between cutting-edge technology and its application in the clinical setting. We had meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks, including the classification of organ tissues, generating pathology report descriptions, and addressing pathology-related questions and answers. We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance. We conducted a qualitative assessment of the capabilities of the base model and the fine-tuned model in performing image captioning and classification tasks on the specific dataset. The evaluation results demonstrate that the fine-tuned model exhibits proficiency in addressing typical pathological questions. We hope that by making both our models and datasets publicly available, they can be valuable to the medical and research communities.

PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

TL;DR

Histopathology AI faces data scarcity and high costs for training multimodal models. The authors build PathEnhanceDS (~45,000 cases across 6 tasks) and apply PathoSync Tuning to LLaVA, Qwen-VL, and InternLM using full-parameter or LoRA updates to align models with pathology tasks. Fine-tuning yields notable improvements in classification accuracy (e.g., PCAM up to 0.96) and captioning/QA quality, with qualitative pathologist feedback supporting honesty and usefulness. By releasing the dataset and tuned models, the work aims to accelerate practical, educational, and research-oriented AI-assisted diagnosis in histopathology.

Abstract

Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant divide between cutting-edge technology and its application in the clinical setting. We had meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks, including the classification of organ tissues, generating pathology report descriptions, and addressing pathology-related questions and answers. We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance. We conducted a qualitative assessment of the capabilities of the base model and the fine-tuned model in performing image captioning and classification tasks on the specific dataset. The evaluation results demonstrate that the fine-tuned model exhibits proficiency in addressing typical pathological questions. We hope that by making both our models and datasets publicly available, they can be valuable to the medical and research communities.
Paper Structure (8 sections, 2 figures, 2 tables)

This paper contains 8 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Instruction tuning of multimodal dataset and models for intelligent assisted diagnosis in histopathology.
  • Figure 2: The composition of PathEnhanceDS.