Table of Contents
Fetching ...

A Medical Multimodal Large Language Model for Pediatric Pneumonia

Weiwei Tian, Xinyu Huang, Tianhao Cheng, Wen He, Jinwu Fang, Rui Feng, Daoying Geng, Xiaobo Zhang

TL;DR

The paper introduces P2Med-MLLM, a unified medical multimodal large language model designed for pediatric pneumonia, trained on the large-scale P2Med-MD dataset and evaluated with the P2Med-MBench benchmark. The approach integrates a Chinese-LLaMA-2 LLM, a CLIP-based vision encoder, and a perceiver module to process plain text and interleaved image-report data (2D X-rays and 3D CT). Through a three-stage training regimen and efficient LoRA-based fine-tuning, P2Med-MLLM demonstrates superior performance across six clinically relevant tasks, including radiology report generation and inpatient/outpatient record creation, compared with existing baselines. The work highlights substantial potential for clinical support in resource-limited settings, while acknowledging limitations in open-ended task accuracy, evaluation robustness, and single-center data, and outlining directions for broader disease coverage and prospective, multi-center studies.

Abstract

Pediatric pneumonia is the leading cause of death among children under five years worldwide, imposing a substantial burden on affected families. Currently, there are three significant hurdles in diagnosing and treating pediatric pneumonia. Firstly, pediatric pneumonia shares similar symptoms with other respiratory diseases, making rapid and accurate differential diagnosis challenging. Secondly, primary hospitals often lack sufficient medical resources and experienced doctors. Lastly, providing personalized diagnostic reports and treatment recommendations is labor-intensive and time-consuming. To tackle these challenges, we proposed a Medical Multimodal Large Language Model for Pediatric Pneumonia (P2Med-MLLM). It was capable of handling diverse clinical tasks, such as generating free-text radiology reports and medical records within a unified framework. Specifically, P2Med-MLLM can process both pure text and image-text data, trained on an extensive and large-scale dataset (P2Med-MD), including real clinical information from 163,999 outpatient and 8,684 inpatient cases. This dataset comprised 2D chest X-ray images, 3D chest CT images, corresponding radiology reports, and outpatient and inpatient records. We designed a three-stage training strategy to enable P2Med-MLLM to comprehend medical knowledge and follow instructions for various clinical tasks. To rigorously evaluate P2Med-MLLM's performance, we developed P2Med-MBench, a benchmark consisting of 642 meticulously verified samples by pediatric pulmonology specialists, covering six clinical decision-support tasks and a balanced variety of diseases. The automated scoring results demonstrated the superiority of P2Med-MLLM. This work plays a crucial role in assisting primary care doctors with prompt disease diagnosis and treatment planning, reducing severe symptom mortality rates, and optimizing the allocation of medical resources.

A Medical Multimodal Large Language Model for Pediatric Pneumonia

TL;DR

The paper introduces P2Med-MLLM, a unified medical multimodal large language model designed for pediatric pneumonia, trained on the large-scale P2Med-MD dataset and evaluated with the P2Med-MBench benchmark. The approach integrates a Chinese-LLaMA-2 LLM, a CLIP-based vision encoder, and a perceiver module to process plain text and interleaved image-report data (2D X-rays and 3D CT). Through a three-stage training regimen and efficient LoRA-based fine-tuning, P2Med-MLLM demonstrates superior performance across six clinically relevant tasks, including radiology report generation and inpatient/outpatient record creation, compared with existing baselines. The work highlights substantial potential for clinical support in resource-limited settings, while acknowledging limitations in open-ended task accuracy, evaluation robustness, and single-center data, and outlining directions for broader disease coverage and prospective, multi-center studies.

Abstract

Pediatric pneumonia is the leading cause of death among children under five years worldwide, imposing a substantial burden on affected families. Currently, there are three significant hurdles in diagnosing and treating pediatric pneumonia. Firstly, pediatric pneumonia shares similar symptoms with other respiratory diseases, making rapid and accurate differential diagnosis challenging. Secondly, primary hospitals often lack sufficient medical resources and experienced doctors. Lastly, providing personalized diagnostic reports and treatment recommendations is labor-intensive and time-consuming. To tackle these challenges, we proposed a Medical Multimodal Large Language Model for Pediatric Pneumonia (P2Med-MLLM). It was capable of handling diverse clinical tasks, such as generating free-text radiology reports and medical records within a unified framework. Specifically, P2Med-MLLM can process both pure text and image-text data, trained on an extensive and large-scale dataset (P2Med-MD), including real clinical information from 163,999 outpatient and 8,684 inpatient cases. This dataset comprised 2D chest X-ray images, 3D chest CT images, corresponding radiology reports, and outpatient and inpatient records. We designed a three-stage training strategy to enable P2Med-MLLM to comprehend medical knowledge and follow instructions for various clinical tasks. To rigorously evaluate P2Med-MLLM's performance, we developed P2Med-MBench, a benchmark consisting of 642 meticulously verified samples by pediatric pulmonology specialists, covering six clinical decision-support tasks and a balanced variety of diseases. The automated scoring results demonstrated the superiority of P2Med-MLLM. This work plays a crucial role in assisting primary care doctors with prompt disease diagnosis and treatment planning, reducing severe symptom mortality rates, and optimizing the allocation of medical resources.
Paper Structure (29 sections, 6 equations, 10 figures, 7 tables)

This paper contains 29 sections, 6 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Overview. (a) The flowchart of this study. (b) The data distribution in different training stages. Note: P2Med-MLLM: Medical Multimodal Large Language Model for Pediatric Pneumonia. CT: Computed Tomography.
  • Figure 2: Illustration of the evaluation process (English version). We evaluated model-generated answers using 13B Chinese-LLaMA-2.
  • Figure 3: Qualitative examples of six different evaluation tasks (English version). We presented input prompts along with answers generated by P2Med-MLLM and the target ground truth. The green color in the figure highlighted correct predictions, the red color indicated errors, and the blue color denoted neglected parts. Note: P2Med-MLLM: Medical Multimodal Large Language Model for Pediatric Pneumonia. CT: Computed Tomography.
  • Figure 4: An ablation study of P2Med-MLLM by removing single stage or modality. We compared six different tasks (a-f) using the accuracy score, with the most crucial evaluation components highlighted in bold. Stage 1 to stage 3 represented medical knowledge infusion pre-training, task type-based balanced instruction-tuning, and disease category-based balanced instruction-tuning, respectively. Note: P2Med-MLLM: Medical Multimodal Large Language Model for Pediatric Pneumonia. CT: Computed Tomography.
  • Figure 5: Performance comparison between multiple single-task dedicated networks and a unified network trained jointly on multiple tasks (P2Med-MLLM). Accuracy (a) and Comprehensiveness (b) scores of impression or diagnosis results were reported, representing the key metrics of evaluation. Task 1 to task 6 represented radiology report generation (X-ray), radiology report generation (CT), outpatient medical record generation, first disease course record generation, attending physician's first ward round record generation, and chief physician's first ward round record generation, respectively. Note: P2Med-MLLM: Medical Multimodal Large Language Model for Pediatric Pneumonia. CT: Computed Tomography.
  • ...and 5 more figures