A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Lei Liu; Xiaoyan Yang; Junchi Lei; Yue Shen; Jian Wang; Peng Wei; Zhixuan Chu; Zhan Qin; Kui Ren

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Lei Liu, Xiaoyan Yang, Junchi Lei, Yue Shen, Jian Wang, Peng Wei, Zhixuan Chu, Zhan Qin, Kui Ren

TL;DR

This survey analyzes the rapid emergence of Medical Large Language Models (Med-LLMs), tracing their evolution from general foundation models to domain-specific systems. It surveys foundational technology (Transformer architectures, training regimes, evaluation), medical tasks (Med-IE, Med-QA, Med-NLI, Med-Gen), and datasets, and then thoroughly examines algorithmic advances (clinical reasoning, KG integration, LLM agents, RAG, alignment, multimodal learning). It also discusses applications across clinical decision support, reporting, education, and research, while emphasizing trust, safety, privacy, and regulatory considerations. The work concludes by outlining future directions in multimodal integration, autonomous medical agents, policy frameworks, and robust evaluation to enable safe, effective deployment of Med-LLMs in healthcare.

Abstract

With the advent of Large Language Models (LLMs), medical artificial intelligence (AI) has experienced substantial technological progress and paradigm shifts, highlighting the potential of LLMs to streamline healthcare delivery and improve patient outcomes. Considering this rapid technical progress, in this survey, we trace the recent advances of Medical Large Language Models (Med-LLMs), including the background, key findings, and mainstream techniques, especially for the evolution from general-purpose models to medical-specialized applications. Firstly, we delve into the foundational technology of Med-LLMs, indicating how general models can be progressively adapted and refined for the complicated medical tasks. Secondly, the wide-ranging applications of Med-LLMs are investigated across various healthcare domains, as well as an up-to-date review of existing Med-LLMs. The transformative impact of these models on daily medical practice is evident through their ability to assist clinicians, educators, and patients. Recognizing the importance of responsible innovation, we discuss the challenges associated with ensuring fairness, accountability, privacy, and robustness. Ethical considerations, rigorous evaluation methodologies, and the establishment of regulatory frameworks are crucial for building trustworthiness in the real-world system. We emphasize the need for ongoing scrutiny and development to maintain high standards of safety and reliability. Finally, we anticipate possible future trajectories for Med-LLMs, identifying key avenues for prudent expansion. By consolidating these insights, our review aims to provide professionals and researchers with a thorough understanding of the strengths and limitations of Med-LLMs, fostering a balanced and ethical approach to their integration into the healthcare ecosystem.

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

TL;DR

Abstract

Paper Structure (46 sections, 3 equations, 5 figures, 5 tables)

This paper contains 46 sections, 3 equations, 5 figures, 5 tables.

Introduction
Background and Technology
Model Architecture
Encoder-Decoder Structure
Self-Attention Mechanism
Positional Encoding
Residual Connections & Layer Normalization
Training Techniques
Discussion on Pre-training Stage
Discussion on Fine-tuning Stage
Reinforcement Learning from Human Feedback
In-Context Learning
From General to Medical-Specific LLMs
NLP Tasks under Medical Domain
Datasets for Med-LLMs
...and 31 more sections

Figures (5)

Figure 1: Comparison of General LLMs and Medical LLMs. This chart highlights the diversity and specialization for large language models.
Figure 2: Organization of the Survey on Medical Large Language Models. his detailed chart outlines the survey's structure, covering background, technology, medical tasks and data, evaluation methods, specific medical LLMs, algorithms, applications, trustworthiness and safety, and future directions. It traces the evolution from early NLP to the latest advancements in medical LLMs, including their development, pre-training, fine-tuning, and various evaluation metrics.
Figure 3: Evolution of Medical Large Language Models from 2019 to 2024. This roadmap highlights the progression and diversification of medical large language models over the years. Starting with BioBERT in 2019, the field has seen the emergence of specialized models for medical information analysis, Q&A, text generation, and psychological support. Notable developments include ClinicalBERT, BioMegatron, and PubMedBERT in 2020, followed by a surge in models like ChatDoctor, DoctorGLM, and Visual Med-Alpaca in 2022. The roadmap culminates with advanced models such as Med-Gemini and Health-LLM in 2024, reflecting the ongoing innovation in leveraging AI for healthcare applications.
Figure 4: Making a LLM to be a Doctor: A Multi-Step Approach. Prompt engineer can craft suitable prompts to derive the desired responses. Medical-specific fine-tuning can update parameters of a pre-trained LLM on a medical datasets to improve the clinical performance. RAG is to combining prompt engineering with context retrieval from external medical documents.
Figure 5: Applications of Med-LLMs. They assist in diagnosing illnesses, developing personalized treatment plans, analyzing medical records for pattern detection, supporting medical education training that offers professional advice and education, and powering intelligent robotics.

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

TL;DR

Abstract

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)