Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi
TL;DR
The paper tackles privacy barriers in deploying clinical NLP by building Asclepius, a multi-task clinical LLM trained entirely on synthetic clinical notes derived from public case reports. It introduces a data-generation pipeline that converts case reports into realistic discharge notes and corresponding instruction-answer pairs, validated by perplexity analyses and clinician-guided prompts. Asclepius (7B and 13B) is trained via domain-adaptive pretraining on synthetic notes and instruction fine-tuning, and is evaluated against GPT-3.5-turbo and open-source models using real discharge summaries, with Asclepius-R (real-note baseline) as a reference. Across preliminary, practical, and professional evaluations, Asclepius demonstrates competitive performance and, in some cases, parity with models trained on real data, supporting the viability of synthetic notes for sharing high-quality clinical LLMs. The work emphasizes open access to data, models, and prompts, enabling broader research and potential clinical AI deployment while acknowledging limitations such as note-type generalization and single-turn interactions, and highlights ongoing concerns about hallucinations and clinical safety.
Abstract
The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius)
