META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine
Mengzhou Sun, Sendong Zhao, Jianyu Chen, Haochun Wang, Bing Qin
TL;DR
The paper addresses the problem of retrieving low-quality or conflicting medical evidence in retrieval-augmented generation for Evidence-Based Medicine. It proposes META-RAG, a meta-analysis-inspired pipeline that re-ranks and filters evidence across three dimensions—reliability, heterogeneity, and extrapolation—before passing high-quality evidence to the generator. The method combines a base publication-type score with LLM-driven reliability and meta-analysis-inspired filtering (DerSimonian-Laird based heterogeneity and PIO-based extrapolation) to produce a ranked evidence set. Experimental validation on MedQA and MMLU datasets with PubMed as the evidence source shows consistent accuracy gains across multiple LLMs and model sizes, with ablations confirming the usefulness of each analysis stage and improvements in evidence quality. Overall, META-RAG reduces the risk of incorrect knowledge infusion in medical responses and enhances the practicality of RAG-based EBM systems.
Abstract
Evidence-based medicine (EBM) holds a crucial role in clinical application. Given suitable medical articles, doctors effectively reduce the incidence of misdiagnoses. Researchers find it efficient to use large language models (LLMs) techniques like RAG for EBM tasks. However, the EBM maintains stringent requirements for evidence, and RAG applications in EBM struggle to efficiently distinguish high-quality evidence. Therefore, inspired by the meta-analysis used in EBM, we provide a new method to re-rank and filter the medical evidence. This method presents multiple principles to filter the best evidence for LLMs to diagnose. We employ a combination of several EBM methods to emulate the meta-analysis, which includes reliability analysis, heterogeneity analysis, and extrapolation analysis. These processes allow the users to retrieve the best medical evidence for the LLMs. Ultimately, we evaluate these high-quality articles and show an accuracy improvement of up to 11.4% in our experiments and results. Our method successfully enables RAG to extract higher-quality and more reliable evidence from the PubMed dataset. This work can reduce the infusion of incorrect knowledge into responses and help users receive more effective replies.
