A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Jinqiang Wang; Huansheng Ning; Yi Peng; Qikai Wei; Daniel Tesfai; Wenwei Mao; Tao Zhu; Runhe Huang

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Jinqiang Wang, Huansheng Ning, Yi Peng, Qikai Wei, Daniel Tesfai, Wenwei Mao, Tao Zhu, Runhe Huang

TL;DR

This survey systematically summarizes how to train medical LLMs based on open-source general LLMs from a more fine-grained perspective and can provide guidance for the development of LLMs focused on various medical applications.

Abstract

Large Language Models (LLMs) have demonstrated surprising performance across various natural language processing tasks. Recently, medical LLMs enhanced with domain-specific knowledge have exhibited excellent capabilities in medical consultation and diagnosis. These models can smoothly simulate doctor-patient dialogues and provide professional medical advice. Most medical LLMs are developed through continued training of open-source general LLMs, which require significantly fewer computational resources than training LLMs from scratch. Additionally, this approach offers better patient privacy protection than API-based solutions. Given the above advantages, this survey systematically summarizes how to train medical LLMs based on open-source general LLMs from a more fine-grained perspective. It covers (a) how to acquire training corpus and construct customized medical training sets, (b) how to choose an appropriate training paradigm, (c) how to choose a suitable evaluation benchmark, and (d) existing challenges and promising research directions are discussed. This survey can provide guidance for the development of LLMs focused on various medical applications, such as medical education, diagnostic planning, and clinical assistants. Related resources and supplemental information can be found on the GitHub repository.

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

TL;DR

Abstract

Paper Structure (36 sections, 4 figures, 2 tables)

This paper contains 36 sections, 4 figures, 2 tables.

Introduction
Data Acquisition and Processing
Training Corpus Sources
Existing Public Datasets
Public Medical Corpus
Professional Medical Organization Corpus
Synthetic Data
Data Processing
Data Cleaning
Data Formatting
Data Augmentation
Translation
Training Paradigms
IFT Paradigm
Parameter Efficient Fine-Tuning
...and 21 more sections

Figures (4)

Figure 1: Training Pipeline from General LLMs to Medical LLMs. Firstly, the medical corpus is collected and processed to form a standardized training set. Next, an appropriate training paradigm is selected to train General LLMs to become Medical LLMs with medical knowledge. The training paradigms consist of three optional training stages: Continued Pretraining (CP), Instruction Fine-tuning (IFT), and Human Alignment (HA). Finally, Medical LLMs are evaluated from both machine and human perspectives.
Figure 2: Detailed Categorization of Corpus Sources and Data Processing Methods.
Figure 5: Percentage of frequency for each corpus source. The abbreviations are the same as in Tab. \ref{['tab:dataset']}.
Figure 6: Training Paradigms. The training stage, the achieved capabilities, the required computing resources and the training complexity are provided for each paradigm.

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

TL;DR

Abstract

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Authors

TL;DR

Abstract

Table of Contents

Figures (4)