A Survey for Large Language Models in Biomedicine

Chong Wang; Mengyao Li; Junjun He; Zhongruo Wang; Erfan Darzi; Zan Chen; Jin Ye; Tianbin Li; Yanzhou Su; Jing Ke; Kaili Qu; Shuxin Li; Yi Yu; Pietro Liò; Tianyun Wang; Yu Guang Wang; Yiqing Shen

A Survey for Large Language Models in Biomedicine

Chong Wang, Mengyao Li, Junjun He, Zhongruo Wang, Erfan Darzi, Zan Chen, Jin Ye, Tianbin Li, Yanzhou Su, Jing Ke, Kaili Qu, Shuxin Li, Yi Yu, Pietro Liò, Tianyun Wang, Yu Guang Wang, Yiqing Shen

TL;DR

The capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine, among others, are explored with insights drawn from 137 key studies.

Abstract

Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv, provides an in-depth examination of the current landscape, applications, challenges, and prospects of LLMs in biomedicine, distinguishing itself by focusing on the practical implications of these models in real-world biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine, among others, with insights drawn from 137 key studies. Then, we discuss adaptation strategies of LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to enhance their performance in specialized biomedical contexts where zero-shot fails to achieve, such as medical question answering and efficient processing of biomedical literature. Finally, we discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics due to the sensitive nature of biomedical data, the need for highly reliable model outputs, and the ethical implications of deploying AI in healthcare. To address these challenges, we also identify future research directions of LLM in biomedicine including federated learning methods to preserve data privacy and integrating explainable AI methodologies to enhance the transparency of LLMs.

A Survey for Large Language Models in Biomedicine

TL;DR

Abstract

Paper Structure (31 sections, 5 figures, 3 tables)

This paper contains 31 sections, 5 figures, 3 tables.

Introduction
Background
Encoder-Only Architecture
Decoder-Only Architecture
Encoder-Decoder Architecture
LLMs in Zero-Shot Biomedical Applications
Diagnostic Assistance
Biomedical Omics and Drug Discovery
Personalized Medicine
Biomedical Literature and Research
Benchmark Datasets and Evaluation Metrics
Summary
Adapting General LLMs to the Biomedical Field
Unimodal Adaptation Strategies
Full-Parameter Fine-Tuning
...and 16 more sections

Figures (5)

Figure 1: Chronological overview of LLMs and their variants in biomedical applications from 2019 to 2024. The timeline illustrates the evolution of both unimodal (top) and multimodal (bottom) models, highlighting key developments across different model architectures including LLAMA, GPT, BERT, BaiChuan, CLIP, and others. Notable milestones such as ESM-1b, Med-PaLM, and BioGPT are shown, demonstrating the progress and diversification of LLMs in the biomedical domain.
Figure 2: Trends and distribution of LLM research papers in biomedical fields from 2018 to 2024. (a) Temporal analysis of LLM research papers, showing quarterly publication counts. A surge in publications is evident beginning in 2021, reflecting growing interest and investment in applying LLMs to biomedical challenges. (b) Distribution of LLM research papers across biomedical specialties. Medicine (31.1%) and Neuroscience (23.2%) emerge as the dominant areas, followed by Radiology (20.4%) and Bioinformatics (17.8%). This distribution illustrates the broad applicability of LLMs across various medical domains and highlights potential areas for future development.
Figure 3: Evaluation of LLMs in biomedical applications in a zero-shot manner. (a) Venn diagram illustrating the distribution and overlap of studies evaluating various LLMs (GPT-4, GPT-3.5, ChatGPT, BERT, LLaMA, and others) in zero-shot biomedical tasks. The numbers indicate the frequency of studies for each model. (b) Violin plots comparing the relative performance of LLMs across different levels of biomedical expertise (Junior, Intermediate, Senior) against a baseline. The y-axis represents relative performance, with positive values indicating superior performance and negative values indicating inferior performance compared to the baseline. The width of each plot reflects the distribution of performance at each expertise level.
Figure 4: Framework for developing and adapting LLMs in biomedicine. This diagram illustrates the end-to-end process of creating or fine-tuning LLMs for biomedical applications. It encompasses data sourcing (e.g., real dialogues, medical Q&A, PubMed), preprocessing stages (collection, cleaning, standardization, annotation, and augmentation), and the division into training, validation, and test sets. The workflow showcases various pre-training approaches and base models (GPT, LLaMA, BERT) alongside specialized fine-tuning techniques such as PEFT, IFT, and RLHF. The resulting biomedical LLMs are optimized for downstream tasks like diagnosis, dose-response prediction, and medical question answering. The framework also incorporates evaluation metrics and a feedback loop for continuous improvement, emphasizing the iterative nature of developing effective biomedical LLMs.
Figure 5: Future directions of LLMs in the biomedical field.

A Survey for Large Language Models in Biomedicine

TL;DR

Abstract

A Survey for Large Language Models in Biomedicine

Authors

TL;DR

Abstract

Table of Contents

Figures (5)