Table of Contents
Fetching ...

Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines

David Oniani, Xizhi Wu, Shyam Visweswaran, Sumit Kapoor, Shravan Kooragayalu, Katelyn Polanska, Yanshan Wang

TL;DR

LLMs enhanced with CPGs outperform plain LLMs with ZSP in providing accurate recommendations for COVID-19 outpatient treatment, highlighting the potential for broader applications beyond the case study.

Abstract

Background Large Language Models (LLMs), enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Prompting (CoT-FSP). To evaluate the effectiveness of the proposed methods, we create a set of synthetic patient descriptions and conduct both automatic and human evaluation of the responses generated by four LLMs: GPT-4, GPT-3.5 Turbo, LLaMA, and PaLM 2. Zero-Shot Prompting (ZSP) was used as the baseline method. We focus on CDS for COVID-19 outpatient treatment as the case study. Results All four LLMs exhibit improved performance when enhanced with CPGs compared to the baseline ZSP. BDT outperformed both CoT-FSP and PAGC in automatic evaluation. All of the proposed methods demonstrated high performance in human evaluation. Conclusion LLMs enhanced with CPGs demonstrate superior performance, as compared to plain LLMs with ZSP, in providing accurate recommendations for COVID-19 outpatient treatment, which also highlights the potential for broader applications beyond the case study.

Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines

TL;DR

LLMs enhanced with CPGs outperform plain LLMs with ZSP in providing accurate recommendations for COVID-19 outpatient treatment, highlighting the potential for broader applications beyond the case study.

Abstract

Background Large Language Models (LLMs), enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Prompting (CoT-FSP). To evaluate the effectiveness of the proposed methods, we create a set of synthetic patient descriptions and conduct both automatic and human evaluation of the responses generated by four LLMs: GPT-4, GPT-3.5 Turbo, LLaMA, and PaLM 2. Zero-Shot Prompting (ZSP) was used as the baseline method. We focus on CDS for COVID-19 outpatient treatment as the case study. Results All four LLMs exhibit improved performance when enhanced with CPGs compared to the baseline ZSP. BDT outperformed both CoT-FSP and PAGC in automatic evaluation. All of the proposed methods demonstrated high performance in human evaluation. Conclusion LLMs enhanced with CPGs demonstrate superior performance, as compared to plain LLMs with ZSP, in providing accurate recommendations for COVID-19 outpatient treatment, which also highlights the potential for broader applications beyond the case study.
Paper Structure (24 sections, 4 figures, 3 tables, 4 algorithms)

This paper contains 24 sections, 4 figures, 3 tables, 4 algorithms.

Figures (4)

  • Figure 1: The figure shows the three proposed methods and the baseline Zero-Shot Prompting (ZSP) method. In the case of the Binary Decision Tree (BDT) method, we use a recursive function to call the LLM with prompts. For Program-Aided Graph Construction (PAGC), a program is a part of the prompt passed to the LLM. Chain-of-Thought-Few-Shot Prompting (CoT-FSP) uses several few-shot examples for guiding the LLM. Finally, ZSP only takes the patient description to produce the result. Note that for BDT, PAGC, and CoT-FSP, the prompt typically contains a task description, patient description, and several few-shot examples besides additions specific to a method (e.g., a program in the case of PAGC).
  • Figure 2: The figure shows the user interface of the chatbot system that implements the proposed methods. Figure \ref{['fig:llmcovidafter']} shows the interface with the prompt but before generating the response. Figure \ref{['fig:llmcovidbefore']} shows the interface after after generating the response. We developed the system as part of the research effort to demonstrate real-world implementation of the methods and collect user feedback.
  • Figure 3: A graphic for the paper.
  • Figure 4: Clinical Practice Guidelines (CPGs) for COVID-19. We used modified the Centers for Disease Control and Prevention (CDC) and Infectious Diseases Society of America (IDSA) COVID-19 Outpatient Treatment Guidelines.