Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem
TL;DR
Hippocrates introduces an open-source framework to advance medical LLMs with full transparency of data, code, and evaluation. It employs a four-phase pipeline—continued pre-training, supervised fine-tuning, and medical preference learning—using LoRA-based adaptation on LLaMA2 7B and Mistral 7B bases, with RLAIF-driven clinician preferences. The resulting Hippo-7B models outperform existing open medical LLMs and approach or exceed some larger closed models on six clinical benchmarks, supported by a standardized evaluation protocol (LM-Eval Harness). The work also analyzes the contribution of each training stage, prompting strategies, and uncertainty calibration, advocating for reproducibility and broader access to medical AI research resources.
Abstract
The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for advancing the field, fostering reproducibility, and encouraging innovation in healthcare AI. We present Hippocrates, an open-source LLM framework specifically developed for the medical domain. In stark contrast to previous efforts, it offers unrestricted access to its training datasets, codebase, checkpoints, and evaluation protocols. This open approach is designed to stimulate collaborative research, allowing the community to build upon, refine, and rigorously evaluate medical LLMs within a transparent ecosystem. Also, we introduce Hippo, a family of 7B models tailored for the medical domain, fine-tuned from Mistral and LLaMA2 through continual pre-training, instruction tuning, and reinforcement learning from human and AI feedback. Our models outperform existing open medical LLMs models by a large-margin, even surpassing models with 70B parameters. Through Hippocrates, we aspire to unlock the full potential of LLMs not just to advance medical knowledge and patient care but also to democratize the benefits of AI research in healthcare, making them available across the globe.
