BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Sahal Shaji Mullappilly; Mohammed Irfan Kurpath; Sara Pieri; Saeed Yahya Alseiari; Shanavas Cholakkal; Khaled Aldahmani; Fahad Khan; Rao Anwer; Salman Khan; Timothy Baldwin; Hisham Cholakkal

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Sara Pieri, Saeed Yahya Alseiari, Shanavas Cholakkal, Khaled Aldahmani, Fahad Khan, Rao Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

TL;DR

BiMediX2 introduces a bilingual Arabic–English medical LMM with multimodal capabilities, unifying text and image reasoning to support diverse clinical tasks. The approach uses a two-stage training pipeline combining Vision-Text alignment via a Projector and LoRA-based multimodal instruction tuning on a large bilingual corpus (BiMed-V), anchored by the Arabic–English BiMed-MBench benchmark. The model achieves state-of-the-art performance across 12 medical benchmarks, excelling in VQA, report generation, and summarization, and significantly outperforms non-bilingual baselines and GPT-4o on several metrics. This work advances accessible, multilingual medical AI and provides datasets, benchmarks, and code to facilitate further research while acknowledging safety, ethical, and deployment considerations for clinical use.

Abstract

We introduce BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model that supports text-based and image-based medical interactions. It enables multi-turn conversation in Arabic and English and supports diverse medical imaging modalities, including radiology, CT, and histology. To train BiMediX2, we curate BiMed-V, an extensive Arabic-English bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions. This dataset supports a range of medical Large Language Model (LLM) and Large Multimodal Model (LMM) tasks, including multi-turn medical conversations, report generation, and visual question answering (VQA). We also introduce BiMed-MBench, the first Arabic-English medical LMM evaluation benchmark, verified by medical experts. BiMediX2 demonstrates excellent performance across multiple medical LLM and LMM benchmarks, achieving state-of-the-art results compared to other open-sourced models. On BiMed-MBench, BiMediX2 outperforms existing methods by over 9% in English and more than 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by approximately 9% in UPHILL factual accuracy evaluations and excels in various medical VQA, report generation, and report summarization tasks. Our trained models, instruction set, and source code are available at https://github.com/mbzuai-oryx/BiMediX2

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

TL;DR

Abstract

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)