Table of Contents
Fetching ...

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

TL;DR

The paper tackles the challenge of medical multi-task learning in large language models by introducing MING-MOE, a Mixture-of-Experts model that performs token-level expert routing and uses Sparse MoLoRA to keep the base model frozen while training a small, parameter-efficient set of adapters. By training on a bilingual medical corpus and evaluating on both Medical NLP benchmarks and medical licensing exams, the approach achieves state-of-the-art results on over 20 tasks and demonstrates strong performance even against GPT-4 on certain Chinese medical assessments. Key contributions include the first MoE-based medical LLM with token-level routing and MoLoRA, a comprehensive 300k-sample, multi-stream fine-tuning dataset, and extensive benchmarking showing improved generalization and inference efficiency. The work highlights the practical impact of combining MoE with low-rank adapters to enable scalable, task-agnostic medical AI with strong knowledge retention and broad applicability in real-world clinical contexts.

Abstract

Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks which often require multi-task learning capabilities. Previous approaches, although beneficial, fall short in real-world applications because they necessitate task-specific annotations at inference time, limiting broader generalization. This paper introduces MING-MOE, a novel Mixture-of-Expert~(MOE)-based medical large language model designed to manage diverse and complex medical tasks without requiring task-specific annotations, thus enhancing its usability across extensive datasets. MING-MOE employs a Mixture of Low-Rank Adaptation (MoLoRA) technique, allowing for efficient parameter usage by maintaining base model parameters static while adapting through a minimal set of trainable parameters. We demonstrate that MING-MOE achieves state-of-the-art (SOTA) performance on over 20 medical tasks, illustrating a significant improvement over existing models. This approach not only extends the capabilities of medical language models but also improves inference efficiency.

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

TL;DR

The paper tackles the challenge of medical multi-task learning in large language models by introducing MING-MOE, a Mixture-of-Experts model that performs token-level expert routing and uses Sparse MoLoRA to keep the base model frozen while training a small, parameter-efficient set of adapters. By training on a bilingual medical corpus and evaluating on both Medical NLP benchmarks and medical licensing exams, the approach achieves state-of-the-art results on over 20 tasks and demonstrates strong performance even against GPT-4 on certain Chinese medical assessments. Key contributions include the first MoE-based medical LLM with token-level routing and MoLoRA, a comprehensive 300k-sample, multi-stream fine-tuning dataset, and extensive benchmarking showing improved generalization and inference efficiency. The work highlights the practical impact of combining MoE with low-rank adapters to enable scalable, task-agnostic medical AI with strong knowledge retention and broad applicability in real-world clinical contexts.

Abstract

Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks which often require multi-task learning capabilities. Previous approaches, although beneficial, fall short in real-world applications because they necessitate task-specific annotations at inference time, limiting broader generalization. This paper introduces MING-MOE, a novel Mixture-of-Expert~(MOE)-based medical large language model designed to manage diverse and complex medical tasks without requiring task-specific annotations, thus enhancing its usability across extensive datasets. MING-MOE employs a Mixture of Low-Rank Adaptation (MoLoRA) technique, allowing for efficient parameter usage by maintaining base model parameters static while adapting through a minimal set of trainable parameters. We demonstrate that MING-MOE achieves state-of-the-art (SOTA) performance on over 20 medical tasks, illustrating a significant improvement over existing models. This approach not only extends the capabilities of medical language models but also improves inference efficiency.
Paper Structure (19 sections, 4 equations, 3 figures, 4 tables)

This paper contains 19 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The MoLoRA FFN and corresponding MoLoRA Linear architecture in MING-MOE. This diagram shows a setting of $N=4$ experts and $K=2$ activated experts during training.
  • Figure 2: Case study on the sample from Chinese National Pharmacist Licensure Examination Pharmacy track. The font with blue color in the question indicates the correct options, the font with red color indicates the wrong answer of the models and the font with green color indicates the correct answer of the models.
  • Figure 3: Case study on the sample from Chinese National Pharmacist Licensure Examination Traditional Chinese Medicine track. The font with blue color in the question indicates the correct options, the font with red color indicates the wrong answer of the models and the font with green color indicates the correct answer of the models.