InMD-X: Large Language Models for Internal Medicine Doctors
Hansle Gwon, Imjin Ahn, Hyoje Jung, Byeolhee Kim, Young-Hak Kim, Tae Joon Jun
TL;DR
This work presents InMD-X, a suite of 11 sub-specialty LLMs for Internal Medicine, each fine-tuned from a base 7B model using PubMed abstracts and a GPT-3.5-derived QA dataset to capture domain-specific knowledge. The authors implement a three-stage training pipeline—continued pre-training, supervised fine-tuning, and LoRA-based parameter-efficient fine-tuning—to enable efficient multi-model inference while preserving subfield accuracy. Dataset construction leverages top-tier journals (via JCR 2023) and PubMed abstracts since 2010, yielding ~150.6 million tokens and ~1.7 million QA pairs across 11 sub-specialties. They demonstrate favorable inference performance with LoRA, faster load times, and more concise, deterministic outputs compared with a baseline, while acknowledging the lack of standardized benchmarks and proposing future directions such as mixture-of-experts and clinical benchmarks for evaluation.
Abstract
In this paper, we introduce InMD-X, a collection of multiple large language models specifically designed to cater to the unique characteristics and demands of Internal Medicine Doctors (IMD). InMD-X represents a groundbreaking development in natural language processing, offering a suite of language models fine-tuned for various aspects of the internal medicine field. These models encompass a wide range of medical sub-specialties, enabling IMDs to perform more efficient and accurate research, diagnosis, and documentation. InMD-X's versatility and adaptability make it a valuable tool for improving the healthcare industry, enhancing communication between healthcare professionals, and advancing medical research. Each model within InMD-X is meticulously tailored to address specific challenges faced by IMDs, ensuring the highest level of precision and comprehensiveness in clinical text analysis and decision support. This paper provides an overview of the design, development, and evaluation of InMD-X, showcasing its potential to revolutionize the way internal medicine practitioners interact with medical data and information. We present results from extensive testing, demonstrating the effectiveness and practical utility of InMD-X in real-world medical scenarios.
