MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang
TL;DR
The paper tackles the challenge of medical multi-task learning in large language models by introducing MING-MOE, a Mixture-of-Experts model that performs token-level expert routing and uses Sparse MoLoRA to keep the base model frozen while training a small, parameter-efficient set of adapters. By training on a bilingual medical corpus and evaluating on both Medical NLP benchmarks and medical licensing exams, the approach achieves state-of-the-art results on over 20 tasks and demonstrates strong performance even against GPT-4 on certain Chinese medical assessments. Key contributions include the first MoE-based medical LLM with token-level routing and MoLoRA, a comprehensive 300k-sample, multi-stream fine-tuning dataset, and extensive benchmarking showing improved generalization and inference efficiency. The work highlights the practical impact of combining MoE with low-rank adapters to enable scalable, task-agnostic medical AI with strong knowledge retention and broad applicability in real-world clinical contexts.
Abstract
Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks which often require multi-task learning capabilities. Previous approaches, although beneficial, fall short in real-world applications because they necessitate task-specific annotations at inference time, limiting broader generalization. This paper introduces MING-MOE, a novel Mixture-of-Expert~(MOE)-based medical large language model designed to manage diverse and complex medical tasks without requiring task-specific annotations, thus enhancing its usability across extensive datasets. MING-MOE employs a Mixture of Low-Rank Adaptation (MoLoRA) technique, allowing for efficient parameter usage by maintaining base model parameters static while adapting through a minimal set of trainable parameters. We demonstrate that MING-MOE achieves state-of-the-art (SOTA) performance on over 20 medical tasks, illustrating a significant improvement over existing models. This approach not only extends the capabilities of medical language models but also improves inference efficiency.
