Table of Contents
Fetching ...

Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

Xinping Zhao, Jindi Yu, Zhenyu Liu, Jifang Wang, Dongfang Li, Yibin Chen, Baotian Hu, Min Zhang

TL;DR

Medico tackles the pervasive problem of hallucinations in large language models by introducing a multi-source evidence fusion framework that jointly detects and corrects factual errors. The approach combines evidence from Web, Wikipedia, Wikidata5m, and user-uploaded files, then uses fusion, an evidence-informed detector with an ensemble, and a rationale-guided corrector to iteratively fix hallucinations. Experimental results on HaluEval with open-source LLMs show that multi-source fusion improves retrieval, detection, and correction performance, supporting Medico's claim of enhanced explainability and robustness. The work also discusses practical considerations such as noise in retrieved evidence, computational costs, and preservation of original content, highlighting Medico's potential as a real-time security plug-in for LLM systems while acknowledging ethical considerations around data privacy.

Abstract

As we all know, hallucinations prevail in Large Language Models (LLMs), where the generated content is coherent but factually incorrect, which inflicts a heavy blow on the widespread application of LLMs. Previous studies have shown that LLMs could confidently state non-existent facts rather than answering ``I don't know''. Therefore, it is necessary to resort to external knowledge to detect and correct the hallucinated content. Since manual detection and correction of factual errors is labor-intensive, developing an automatic end-to-end hallucination-checking approach is indeed a needful thing. To this end, we present Medico, a Multi-source evidence fusion enhanced hallucination detection and correction framework. It fuses diverse evidence from multiple sources, detects whether the generated content contains factual errors, provides the rationale behind the judgment, and iteratively revises the hallucinated content. Experimental results on evidence retrieval (0.964 HR@5, 0.908 MRR@5), hallucination detection (0.927-0.951 F1), and hallucination correction (0.973-0.979 approval rate) manifest the great potential of Medico. A video demo of Medico can be found at https://youtu.be/RtsO6CSesBI.

Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

TL;DR

Medico tackles the pervasive problem of hallucinations in large language models by introducing a multi-source evidence fusion framework that jointly detects and corrects factual errors. The approach combines evidence from Web, Wikipedia, Wikidata5m, and user-uploaded files, then uses fusion, an evidence-informed detector with an ensemble, and a rationale-guided corrector to iteratively fix hallucinations. Experimental results on HaluEval with open-source LLMs show that multi-source fusion improves retrieval, detection, and correction performance, supporting Medico's claim of enhanced explainability and robustness. The work also discusses practical considerations such as noise in retrieved evidence, computational costs, and preservation of original content, highlighting Medico's potential as a real-time security plug-in for LLM systems while acknowledging ethical considerations around data privacy.

Abstract

As we all know, hallucinations prevail in Large Language Models (LLMs), where the generated content is coherent but factually incorrect, which inflicts a heavy blow on the widespread application of LLMs. Previous studies have shown that LLMs could confidently state non-existent facts rather than answering ``I don't know''. Therefore, it is necessary to resort to external knowledge to detect and correct the hallucinated content. Since manual detection and correction of factual errors is labor-intensive, developing an automatic end-to-end hallucination-checking approach is indeed a needful thing. To this end, we present Medico, a Multi-source evidence fusion enhanced hallucination detection and correction framework. It fuses diverse evidence from multiple sources, detects whether the generated content contains factual errors, provides the rationale behind the judgment, and iteratively revises the hallucinated content. Experimental results on evidence retrieval (0.964 HR@5, 0.908 MRR@5), hallucination detection (0.927-0.951 F1), and hallucination correction (0.973-0.979 approval rate) manifest the great potential of Medico. A video demo of Medico can be found at https://youtu.be/RtsO6CSesBI.

Paper Structure

This paper contains 18 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Motivation example. The generated content and retrieved evidence are marked in yellow and green, respectively. (a) shows the situation of acquiring evidence in a single way and making an erroneous judgment due to outdated evidence. (b) shows the situation, where users are only provided with a veracity label, confusing users about why and where the content is incorrect.
  • Figure 2: The overall system framework of Medico. The upper layer illustrates the working flow of multi-source evidence fusion while the bottom layer illustrates the working flow of hallucination detection as well as correction.
  • Figure 3: Screenshot of our hallucination detection and correction system Medico. The left shows the interface for entering the user query and the generated response. The middle shows the interface for selecting retrieval sources and uploading files. The right demonstrates the evidence retrieved from diverse sources and their fused evidence.