Table of Contents
Fetching ...

META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine

Mengzhou Sun, Sendong Zhao, Jianyu Chen, Haochun Wang, Bing Qin

TL;DR

The paper addresses the problem of retrieving low-quality or conflicting medical evidence in retrieval-augmented generation for Evidence-Based Medicine. It proposes META-RAG, a meta-analysis-inspired pipeline that re-ranks and filters evidence across three dimensions—reliability, heterogeneity, and extrapolation—before passing high-quality evidence to the generator. The method combines a base publication-type score with LLM-driven reliability and meta-analysis-inspired filtering (DerSimonian-Laird based heterogeneity and PIO-based extrapolation) to produce a ranked evidence set. Experimental validation on MedQA and MMLU datasets with PubMed as the evidence source shows consistent accuracy gains across multiple LLMs and model sizes, with ablations confirming the usefulness of each analysis stage and improvements in evidence quality. Overall, META-RAG reduces the risk of incorrect knowledge infusion in medical responses and enhances the practicality of RAG-based EBM systems.

Abstract

Evidence-based medicine (EBM) holds a crucial role in clinical application. Given suitable medical articles, doctors effectively reduce the incidence of misdiagnoses. Researchers find it efficient to use large language models (LLMs) techniques like RAG for EBM tasks. However, the EBM maintains stringent requirements for evidence, and RAG applications in EBM struggle to efficiently distinguish high-quality evidence. Therefore, inspired by the meta-analysis used in EBM, we provide a new method to re-rank and filter the medical evidence. This method presents multiple principles to filter the best evidence for LLMs to diagnose. We employ a combination of several EBM methods to emulate the meta-analysis, which includes reliability analysis, heterogeneity analysis, and extrapolation analysis. These processes allow the users to retrieve the best medical evidence for the LLMs. Ultimately, we evaluate these high-quality articles and show an accuracy improvement of up to 11.4% in our experiments and results. Our method successfully enables RAG to extract higher-quality and more reliable evidence from the PubMed dataset. This work can reduce the infusion of incorrect knowledge into responses and help users receive more effective replies.

META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine

TL;DR

The paper addresses the problem of retrieving low-quality or conflicting medical evidence in retrieval-augmented generation for Evidence-Based Medicine. It proposes META-RAG, a meta-analysis-inspired pipeline that re-ranks and filters evidence across three dimensions—reliability, heterogeneity, and extrapolation—before passing high-quality evidence to the generator. The method combines a base publication-type score with LLM-driven reliability and meta-analysis-inspired filtering (DerSimonian-Laird based heterogeneity and PIO-based extrapolation) to produce a ranked evidence set. Experimental validation on MedQA and MMLU datasets with PubMed as the evidence source shows consistent accuracy gains across multiple LLMs and model sizes, with ablations confirming the usefulness of each analysis stage and improvements in evidence quality. Overall, META-RAG reduces the risk of incorrect knowledge infusion in medical responses and enhances the practicality of RAG-based EBM systems.

Abstract

Evidence-based medicine (EBM) holds a crucial role in clinical application. Given suitable medical articles, doctors effectively reduce the incidence of misdiagnoses. Researchers find it efficient to use large language models (LLMs) techniques like RAG for EBM tasks. However, the EBM maintains stringent requirements for evidence, and RAG applications in EBM struggle to efficiently distinguish high-quality evidence. Therefore, inspired by the meta-analysis used in EBM, we provide a new method to re-rank and filter the medical evidence. This method presents multiple principles to filter the best evidence for LLMs to diagnose. We employ a combination of several EBM methods to emulate the meta-analysis, which includes reliability analysis, heterogeneity analysis, and extrapolation analysis. These processes allow the users to retrieve the best medical evidence for the LLMs. Ultimately, we evaluate these high-quality articles and show an accuracy improvement of up to 11.4% in our experiments and results. Our method successfully enables RAG to extract higher-quality and more reliable evidence from the PubMed dataset. This work can reduce the infusion of incorrect knowledge into responses and help users receive more effective replies.

Paper Structure

This paper contains 17 sections, 7 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: When traditional RAG processes a query, it probably retrieves a large volume of unhelpful and non-professional evidence. This evidence may include conditional results and outdated conclusions. This will mislead the generator to mistakes.
  • Figure 2: The pipeline of META-RAG includes (1) reliability analysis, (2) heterogeneity analysis, and (3) extrapolation analysis. Our method incorporates these three stages to re-rank and filter evidence, providing as high-quality evidence as possible to (4) generator LLM.
  • Figure 3: The pipeline of the reliability analysis. We synthesize the information and the judgments of LLM to show the reliability of each evidence.
  • Figure 4: We divide the evidence type into 7 levels. In reliability analysis, we categorize evidence from different publication types and LLM judgments. The higher level of evidence means a better publication type score.
  • Figure 5: The specific information structures retrieved from the PubMed dataset. By analyzing this detailed information of articles, we can comprehensively assess whether the literature is sufficiently authoritative.
  • ...and 1 more figures