HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

Sumera Anjum; Hanzhi Zhang; Wenjun Zhou; Eun Jin Paek; Xiaopeng Zhao; Yunhe Feng

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

Sumera Anjum, Hanzhi Zhang, Wenjun Zhou, Eun Jin Paek, Xiaopeng Zhao, Yunhe Feng

TL;DR

This paper introduces HALO, a novel framework designed to enhance the accuracy and reliability of medical question-answering (QA) systems by focusing on the detection and mitigation of hallucinations, and utilizes maximum marginal relevance scoring to prioritize the retrieved context.

Abstract

Large language models (LLMs) have significantly advanced natural language processing tasks, yet they are susceptible to generating inaccurate or unreliable responses, a phenomenon known as hallucination. In critical domains such as health and medicine, these hallucinations can pose serious risks. This paper introduces HALO, a novel framework designed to enhance the accuracy and reliability of medical question-answering (QA) systems by focusing on the detection and mitigation of hallucinations. Our approach generates multiple variations of a given query using LLMs and retrieves relevant information from external open knowledge bases to enrich the context. We utilize maximum marginal relevance scoring to prioritize the retrieved context, which is then provided to LLMs for answer generation, thereby reducing the risk of hallucinations. The integration of LangChain further streamlines this process, resulting in a notable and robust increase in the accuracy of both open-source and commercial LLMs, such as Llama-3.1 (from 44% to 65%) and ChatGPT (from 56% to 70%). This framework underscores the critical importance of addressing hallucinations in medical QA systems, ultimately improving clinical decision-making and patient care. The open-source HALO is available at: https://github.com/ResponsibleAILab/HALO.

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 4 figures, 2 tables)

This paper contains 19 sections, 1 equation, 4 figures, 2 tables.

Introduction
Related Work
Methodology
HALO Framework Overview
Multiquery Generation
Contextual Knowledge Integration
Prompt Engineering
An Example of How HALO Works
Evaluation
Evaluation Metrics
MedMCQA Dataset
LLM Models
ChatGPT-3.5-16K
Llama-3.1 8B
Mistral 7B
...and 4 more sections

Figures (4)

Figure 1: HALO framework overview. HALO comprises three key components: multiquery generation, contextual knowledge integration through the maximum marginal relevance-optimized RAG, and few-shot and CoT-based prompt engineering.
Figure 2: Example of HALO framework applied to a medical question from the MedMCQA dataset
Figure 3: Distribution of MedMCQA questions across 21 medical subjects, covering a wide range of topics
Figure 4: Comparative analysis of the LLMs (ChatGPT-3.5, Llama-3.1 8B, Mistral 7B) and the HALO framework on LLMs accuracies across 21 subjects in MedMCQA dataset

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

TL;DR

Abstract

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

Authors

TL;DR

Abstract

Table of Contents

Figures (4)