Table of Contents
Fetching ...

Safety challenges of AI in medicine in the era of large language models

Xiaoye Wang, Nicole Xi Zhang, Hongyu He, Trang Nguyen, Kun-Hsing Yu, Hao Deng, Cynthia Brandt, Danielle S. Bitterman, Ling Pan, Ching-Yu Cheng, James Zou, Dianbo Liu

TL;DR

This review examines emerging risks in AI utilization during the LLM era from functional and communication perspectives, and considers inherent safety problems shared by all AI systems, along with additional complications introduced by LLMs.

Abstract

Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have unlocked significant potential to enhance the quality and efficiency of medical care. By introducing a novel way to interact with AI and data through natural language, LLMs offer new opportunities for medical practitioners, patients, and researchers. However, as AI and LLMs become more powerful and especially achieve superhuman performance in some medical tasks, public concerns over their safety have intensified. These concerns about AI safety have emerged as the most significant obstacles to the adoption of AI in medicine. In response, this review examines emerging risks in AI utilization during the LLM era. First, we explore LLM-specific safety challenges from functional and communication perspectives, addressing issues across data collection, model training, and real-world application. We then consider inherent safety problems shared by all AI systems, along with additional complications introduced by LLMs. Last, we discussed how safety issues of using AI in clinical practice and healthcare system operation would undermine trust among patient, clinicians and the public, and how to build confidence in these systems. By emphasizing the development of safe AI, we believe these technologies can be more rapidly and reliably integrated into everyday medical practice to benefit both patients and clinicians.

Safety challenges of AI in medicine in the era of large language models

TL;DR

This review examines emerging risks in AI utilization during the LLM era from functional and communication perspectives, and considers inherent safety problems shared by all AI systems, along with additional complications introduced by LLMs.

Abstract

Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have unlocked significant potential to enhance the quality and efficiency of medical care. By introducing a novel way to interact with AI and data through natural language, LLMs offer new opportunities for medical practitioners, patients, and researchers. However, as AI and LLMs become more powerful and especially achieve superhuman performance in some medical tasks, public concerns over their safety have intensified. These concerns about AI safety have emerged as the most significant obstacles to the adoption of AI in medicine. In response, this review examines emerging risks in AI utilization during the LLM era. First, we explore LLM-specific safety challenges from functional and communication perspectives, addressing issues across data collection, model training, and real-world application. We then consider inherent safety problems shared by all AI systems, along with additional complications introduced by LLMs. Last, we discussed how safety issues of using AI in clinical practice and healthcare system operation would undermine trust among patient, clinicians and the public, and how to build confidence in these systems. By emphasizing the development of safe AI, we believe these technologies can be more rapidly and reliably integrated into everyday medical practice to benefit both patients and clinicians.
Paper Structure (21 sections, 3 figures, 3 tables)

This paper contains 21 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: AI safety challenges in medicine related to large language models.
  • Figure 2: General safety issues in medicine shared by common AI models and LLMs. These are inherent problems of most AI models related to real-world healthcare.
  • Figure 3: Generalization issue visualization of AI in medicine. This figure illustrates how AI models perform differently on training data versus new, unseen data. The correct model is shown as the ground truth in the top left. Underfitting occurs when the model is too simple to capture patterns. Overfitting happens when a model performs well on training data but poorly on new data. Population shift refers to a model's performance decline when applied to patient groups with different data distributions. Temporal shift occurs when patient data distributions change over time.