Table of Contents
Fetching ...

Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm

Hongcheng Liu, Yusheng Liao, Siqv Ou, Yuhao Wang, Heyang Liu, Yanfeng Wang, Yu Wang

TL;DR

Med-PMC introduces a proactive, interactive framework to evaluate medical multi-modal LLMs in realistic clinical settings by coupling a multi-turn doctor-patient-actor loop with a separate technician for tests. The framework demonstrates that current MLLMs struggle to effectively gather multimodal information and are susceptible to biases when interacting with personalized patient actors. Across 12 doctor models, the study shows variable gains from prompting strategies and highlights the importance of robust multimodal interpretation for accurate diagnosis and treatment planning. The work provides actionable guidance for building more reliable clinical MLLMs and makes available code and data to support ongoing development and benchmarking.

Abstract

The application of the Multi-modal Large Language Models (MLLMs) in medical clinical scenarios remains underexplored. Previous benchmarks only focus on the capacity of the MLLMs in medical visual question-answering (VQA) or report generation and fail to assess the performance of the MLLMs on complex clinical multi-modal tasks. In this paper, we propose a novel Medical Personalized Multi-modal Consultation (Med-PMC) paradigm to evaluate the clinical capacity of the MLLMs. Med-PMC builds a simulated clinical environment where the MLLMs are required to interact with a patient simulator to complete the multi-modal information-gathering and decision-making task. Specifically, the patient simulator is decorated with personalized actors to simulate diverse patients in real scenarios. We conduct extensive experiments to access 12 types of MLLMs, providing a comprehensive view of the MLLMs' clinical performance. We found that current MLLMs fail to gather multimodal information and show potential bias in the decision-making task when consulted with the personalized patient simulators. Further analysis demonstrates the effectiveness of Med-PMC, showing the potential to guide the development of robust and reliable clinical MLLMs. Code and data are available at https://github.com/LiuHC0428/Med-PMC.

Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm

TL;DR

Med-PMC introduces a proactive, interactive framework to evaluate medical multi-modal LLMs in realistic clinical settings by coupling a multi-turn doctor-patient-actor loop with a separate technician for tests. The framework demonstrates that current MLLMs struggle to effectively gather multimodal information and are susceptible to biases when interacting with personalized patient actors. Across 12 doctor models, the study shows variable gains from prompting strategies and highlights the importance of robust multimodal interpretation for accurate diagnosis and treatment planning. The work provides actionable guidance for building more reliable clinical MLLMs and makes available code and data to support ongoing development and benchmarking.

Abstract

The application of the Multi-modal Large Language Models (MLLMs) in medical clinical scenarios remains underexplored. Previous benchmarks only focus on the capacity of the MLLMs in medical visual question-answering (VQA) or report generation and fail to assess the performance of the MLLMs on complex clinical multi-modal tasks. In this paper, we propose a novel Medical Personalized Multi-modal Consultation (Med-PMC) paradigm to evaluate the clinical capacity of the MLLMs. Med-PMC builds a simulated clinical environment where the MLLMs are required to interact with a patient simulator to complete the multi-modal information-gathering and decision-making task. Specifically, the patient simulator is decorated with personalized actors to simulate diverse patients in real scenarios. We conduct extensive experiments to access 12 types of MLLMs, providing a comprehensive view of the MLLMs' clinical performance. We found that current MLLMs fail to gather multimodal information and show potential bias in the decision-making task when consulted with the personalized patient simulators. Further analysis demonstrates the effectiveness of Med-PMC, showing the potential to guide the development of robust and reliable clinical MLLMs. Code and data are available at https://github.com/LiuHC0428/Med-PMC.
Paper Structure (51 sections, 11 equations, 5 figures, 25 tables)

This paper contains 51 sections, 11 equations, 5 figures, 25 tables.

Figures (5)

  • Figure 1: Overview of the Med-PMC evaluation framework. The whole framework can be divided into three parts, including a) Multi-modal consultation, b) Patient Simulator, and c) Evaluation.
  • Figure 2: Results of LLM-based evaluation on consultation with both standard patient and actor patient.
  • Figure 3: Ablation study of multi-modal information on (a) Diagnosis performance and (b) Treatment performance. We compare three situations to show the impact of the multi-modal information on the consultation results. Specifically, 'w/o MM-Info' represents the model without any multi-modal information, 'w/MM-Con' denotes the model with multi-modal information obtained through consultation, and 'w/MM-GT' signifies the model with complete multi-modal information.
  • Figure 4: Gender bias of MLLMs. On the left is the GPT-4o. On the right is the Gemini1.5-Pro. Both MLLMs exhibit varying degrees of gender bias in medical consultations. All the scores are normalized.
  • Figure 5: Information gathering performance changes with the consultation turns. All the scores are the averaged results of 12 types of MLLMs.