Table of Contents
Fetching ...

A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Zifeng Wang, Hanyin Wang, Benjamin Danek, Ying Li, Christina Mack, Hoifung Poon, Yajuan Wang, Pranav Rajpurkar, Jimeng Sun

TL;DR

Generalist LLMs underperform in specialized medical contexts and can yield unsafe outputs, motivating domain-specific adaptations. The paper proposes a three-step framework—Modeling (model development), Optimization (prompting and RAG), and System engineering (agent/chain architectures)—and a playbook linking these methods to real-world medical tasks. It presents concrete use cases (clinical note generation, automated coding, patient-trial matching, medical systematic reviews, and privacy-legislation mapping) and details how to design, implement, and evaluate end-to-end medical AI pipelines. The contribution lies in a structured, systems-oriented path to build trustworthy, scalable medical AI that integrates external knowledge, tools, and human oversight while addressing hallucinations, privacy, and regulatory considerations.

Abstract

The integration of Large Language Models (LLMs) into medical applications has sparked widespread interest across the healthcare industry, from drug discovery and development to clinical decision support, assisting telemedicine, medical devices, and healthcare insurance applications. This perspective paper aims to discuss the inner workings of building LLM-powered medical AI applications and introduces a comprehensive framework for their development. We review existing literature and outline the unique challenges of applying LLMs in specialized medical contexts. Additionally, we introduce a three-step framework to organize medical LLM research activities: 1) Modeling: breaking down complex medical workflows into manageable steps for developing medical-specific models; 2) Optimization: optimizing the model performance with crafted prompts and integrating external knowledge and tools, and 3) System engineering: decomposing complex tasks into subtasks and leveraging human expertise for building medical AI applications. Furthermore, we offer a detailed use case playbook that describes various LLM-powered medical AI applications, such as optimizing clinical trial design, enhancing clinical decision support, and advancing medical imaging analysis. Finally, we discuss various challenges and considerations for building medical AI applications with LLMs, such as handling hallucination issues, data ownership and compliance, privacy, intellectual property considerations, compute cost, sustainability issues, and responsible AI requirements.

A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

TL;DR

Generalist LLMs underperform in specialized medical contexts and can yield unsafe outputs, motivating domain-specific adaptations. The paper proposes a three-step framework—Modeling (model development), Optimization (prompting and RAG), and System engineering (agent/chain architectures)—and a playbook linking these methods to real-world medical tasks. It presents concrete use cases (clinical note generation, automated coding, patient-trial matching, medical systematic reviews, and privacy-legislation mapping) and details how to design, implement, and evaluate end-to-end medical AI pipelines. The contribution lies in a structured, systems-oriented path to build trustworthy, scalable medical AI that integrates external knowledge, tools, and human oversight while addressing hallucinations, privacy, and regulatory considerations.

Abstract

The integration of Large Language Models (LLMs) into medical applications has sparked widespread interest across the healthcare industry, from drug discovery and development to clinical decision support, assisting telemedicine, medical devices, and healthcare insurance applications. This perspective paper aims to discuss the inner workings of building LLM-powered medical AI applications and introduces a comprehensive framework for their development. We review existing literature and outline the unique challenges of applying LLMs in specialized medical contexts. Additionally, we introduce a three-step framework to organize medical LLM research activities: 1) Modeling: breaking down complex medical workflows into manageable steps for developing medical-specific models; 2) Optimization: optimizing the model performance with crafted prompts and integrating external knowledge and tools, and 3) System engineering: decomposing complex tasks into subtasks and leveraging human expertise for building medical AI applications. Furthermore, we offer a detailed use case playbook that describes various LLM-powered medical AI applications, such as optimizing clinical trial design, enhancing clinical decision support, and advancing medical imaging analysis. Finally, we discuss various challenges and considerations for building medical AI applications with LLMs, such as handling hallucination issues, data ownership and compliance, privacy, intellectual property considerations, compute cost, sustainability issues, and responsible AI requirements.

Paper Structure

This paper contains 20 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Workflow for adapting generalist LLMs for medical AI through adaptation techniques. (a) Generalist AI models, such as proprietary systems (e.g., Open AI's GPT-4) and open-source models (e.g., LLaMA), serve as foundational technologies for developing specialized medical AI models. (b) Adapting generalist AI to medical tasks involves several techniques, including model fine-tuning, prompt optimization, and the development of AI agents or AI chains. These methods use diverse medical datasets, such as medical images, electronic health records (EHRs), clinical notes, publications, and omics data, to enhance AI model training and performance. (c) Effective system engineering for medical AI entails integrating AI modules into comprehensive chains to support tasks like cohort extraction, eligibility assessment, and result verification. This process emphasizes human interaction and AI, resulting in tailored AI modules for specific applications. (d) Generalist AI applications in medicine span various domains, including conversational diagnosis, radiology report generation, clinical note summarization, automated medical coding, drug design, patient-trial matching, and systematic literature reviews. All require advanced system integration for optimal performance.
  • Figure 2: This playbook outlines the process of adapting large language models (LLMs) for medical AI using a systems engineering approach. (a) Selecting the overall architecture of the system should be based on the properties and requirements of the task at hand. (b) When building agent systems, four main modules need to be developed. LLMs act in different roles when equipped with different modules and interact with human experts to dynamically conduct the target task. (c) When building AI chain systems, we can first define the pipeline that decomposes the task into small steps following expert workflow or professional guidelines and then develop the module responsible for each step. (d) Adaptation techniques can be applied to enhance LLM's performance for the AI agent or for the AI module. Adaptation methods need to be selected according to data availability and task requirements.
  • Figure 3: Illustration of example use cases adapting LLMs to medical AI tasks. a, AI chain crafted for clinical note generation, highlighting the expert involvement in selecting relevant patient data and adherence to external formatting guidelines. b, Automate medical coding can potentially benefit from an AI chain that employs two extraction modules designed for conditions and complications, respectively. c, A patient-trial matching pipeline adds a prescreening stage to reduce the candidate trial set. It also provides criterion-level assessment for users to select patients referring to various dimensions. d, Medical systematic review pipeline is built based on the established systematic review practice.