Table of Contents
Fetching ...

Med42-v2: A Suite of Clinical LLMs

Clément Christophe, Praveen K Kanithi, Tathagata Raha, Shadab Khan, Marco AF Pimentel

TL;DR

Med42-v2 targets the gap between generic LLMs and healthcare needs by combining domain-specific clinical fine-tuning with a multi-stage preference alignment strategy. Built on Llama3, the suite demonstrates improved performance over existing Llama3 variants and GPT-4 across several medical benchmarks, in 8B and 70B scales. The authors provide a transparent training setup, including diverse medical and some general-domain data, and publicly release the models. They acknowledge remaining challenges such as hallucinations and bias, and propose an evaluation framework for real-world clinical utility to enable safer deployment.

Abstract

Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.

Med42-v2: A Suite of Clinical LLMs

TL;DR

Med42-v2 targets the gap between generic LLMs and healthcare needs by combining domain-specific clinical fine-tuning with a multi-stage preference alignment strategy. Built on Llama3, the suite demonstrates improved performance over existing Llama3 variants and GPT-4 across several medical benchmarks, in 8B and 70B scales. The authors provide a transparent training setup, including diverse medical and some general-domain data, and publicly release the models. They acknowledge remaining challenges such as hallucinations and bias, and propose an evaluation framework for real-world clinical utility to enable safer deployment.

Abstract

Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.
Paper Structure (14 sections, 5 tables)