Table of Contents
Fetching ...

BiMediX: Bilingual Medical Mixture of Experts LLM

Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

TL;DR

BiMediX addresses the underrepresentation of Arabic in medical NLP by introducing the first bilingual medical mixture of experts LLM capable of English and Arabic interactions. The approach combines the BiMed1.3M bilingual instruction set with a semi automated English to Arabic translation pipeline and QLoRA based fine tuning of a Mixtral MoE model to deliver efficient bilingual medical dialogue, QA and MCQA capabilities. It achieves state of the art or competitive results on English medical benchmarks and shows meaningful gains on Arabic and bilingual evaluations against baselines, while offering eightfold faster inference. The work provides open resources including a large Arabic English medical benchmark and dataset to advance bilingual medical AI research with attention to safety and ethical implications.

Abstract

In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic. Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details such as patient symptoms and medical history, multiple-choice question answering, and open-ended question answering. We propose a semi-automated English-to-Arabic translation pipeline with human refinement to ensure high-quality translations. We also introduce a comprehensive evaluation benchmark for Arabic medical LLMs. Furthermore, we introduce BiMed1.3M, an extensive Arabic-English bilingual instruction set covering 1.3 Million diverse medical interactions, resulting in over 632 million healthcare specialized tokens for instruction tuning. Our BiMed1.3M dataset includes 250k synthesized multi-turn doctor-patient chats and maintains a 1:2 Arabic-to-English ratio. Our model outperforms state-of-the-art Med42 and Meditron by average absolute gains of 2.5% and 4.1%, respectively, computed across multiple medical evaluation benchmarks in English, while operating at 8-times faster inference. Moreover, our BiMediX outperforms the generic Arabic-English bilingual LLM, Jais-30B, by average absolute gains of 10% on our Arabic medical benchmark and 15% on bilingual evaluations across multiple datasets. Our project page with source code and trained model is available at https://github.com/mbzuai-oryx/BiMediX .

BiMediX: Bilingual Medical Mixture of Experts LLM

TL;DR

BiMediX addresses the underrepresentation of Arabic in medical NLP by introducing the first bilingual medical mixture of experts LLM capable of English and Arabic interactions. The approach combines the BiMed1.3M bilingual instruction set with a semi automated English to Arabic translation pipeline and QLoRA based fine tuning of a Mixtral MoE model to deliver efficient bilingual medical dialogue, QA and MCQA capabilities. It achieves state of the art or competitive results on English medical benchmarks and shows meaningful gains on Arabic and bilingual evaluations against baselines, while offering eightfold faster inference. The work provides open resources including a large Arabic English medical benchmark and dataset to advance bilingual medical AI research with attention to safety and ethical implications.

Abstract

In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic. Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details such as patient symptoms and medical history, multiple-choice question answering, and open-ended question answering. We propose a semi-automated English-to-Arabic translation pipeline with human refinement to ensure high-quality translations. We also introduce a comprehensive evaluation benchmark for Arabic medical LLMs. Furthermore, we introduce BiMed1.3M, an extensive Arabic-English bilingual instruction set covering 1.3 Million diverse medical interactions, resulting in over 632 million healthcare specialized tokens for instruction tuning. Our BiMed1.3M dataset includes 250k synthesized multi-turn doctor-patient chats and maintains a 1:2 Arabic-to-English ratio. Our model outperforms state-of-the-art Med42 and Meditron by average absolute gains of 2.5% and 4.1%, respectively, computed across multiple medical evaluation benchmarks in English, while operating at 8-times faster inference. Moreover, our BiMediX outperforms the generic Arabic-English bilingual LLM, Jais-30B, by average absolute gains of 10% on our Arabic medical benchmark and 15% on bilingual evaluations across multiple datasets. Our project page with source code and trained model is available at https://github.com/mbzuai-oryx/BiMediX .
Paper Structure (17 sections, 7 figures, 5 tables)

This paper contains 17 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: BiMediX is adept at conducting precise medical conversations seamlessly in both English and Arabic. It demonstrates advanced interactive capabilities, such as soliciting further details from the patient and seamlessly switching languages based on patient input while maintaining the continuity of the conversation. The Arabic text translates as follows: User:The pain occurs mostly on both sides of the head, sometimes on only one side. It is a throbbing pain that can last for several hours. Light and noise seem to make it worse.BiMediX:Thank you for the detailed information. Based on your symptoms, it sounds like you may have a migraine. Migraine headaches are usually characterized by throbbing pain on one or both sides of the head and can be triggered or increased in intensity by light and noise. It is important to consider that there are other possible causes for your symptoms, so we may need to do further evaluation. Have you noticed any specific factors that trigger these headaches?
  • Figure 2: English and Arabic language capabilities of our BiMediX with multiple-choice questions and answers (MCQA) on the top row, and open-ended questions and responses (QA) on the bottom row.
  • Figure 3: Statistics of the BiMed1.3M across QA, MCQA, and Chat (with more than one turn of exchanges) totaling 623M tokens and 1.3M samples. This bilingual dataset, derived from translating approximately 50% of the English dataset into Arabic, is 1.5 times larger due to its 2:1 English-to-Arabic content ratio.
  • Figure 4: BiMed1.3M Dataset Overview: (a) Data type distribution and (b) Analysis of frequent words.
  • Figure 5: Overview of the proposed semi-automated, iterative translation pipeline featuring human alignment. The process involves iterative translation and score optimization utilizing LLMs (such as ChatGPT) and manual refinement by professionals for samples with low scores along with a random selection of high-scoring samples.
  • ...and 2 more figures