Table of Contents
Fetching ...

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

Yucheng Shi, Tianze Yang, Canyu Chen, Quanzheng Li, Tianming Liu, Xiang Li, Ninghao Liu

TL;DR

SearchRAG tackles the problem of medical QA accuracy with LLMs by introducing two key components: synthetic query generation to tailor search-engine prompts and uncertainty-based knowledge selection to filter retrieved content. The framework aligns LLMs with real-time search engines at inference time, using a formal RAG formulation and an entropy-based criterion to select informative snippets via ΔH_i = H(Y|x) − H(Y|x_i'). Empirical results across MedQA, MMLU_Med, and MedMCQA show substantial accuracy gains for 8B and 70B LLaMA variants, with notable improvements when increasing synthetic query count and applying knowledge filtering, while static sources like PubMed can occasionally degrade performance for smaller models. The work demonstrates that real-time access to current medical knowledge, combined with targeted query synthesis and uncertainty-guided filtering, meaningfully enhances medical QA performance and provides a scalable approach for inference-time knowledge integration in LLMs.

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

TL;DR

SearchRAG tackles the problem of medical QA accuracy with LLMs by introducing two key components: synthetic query generation to tailor search-engine prompts and uncertainty-based knowledge selection to filter retrieved content. The framework aligns LLMs with real-time search engines at inference time, using a formal RAG formulation and an entropy-based criterion to select informative snippets via ΔH_i = H(Y|x) − H(Y|x_i'). Empirical results across MedQA, MMLU_Med, and MedMCQA show substantial accuracy gains for 8B and 70B LLaMA variants, with notable improvements when increasing synthetic query count and applying knowledge filtering, while static sources like PubMed can occasionally degrade performance for smaller models. The work demonstrates that real-time access to current medical knowledge, combined with targeted query synthesis and uncertainty-guided filtering, meaningfully enhances medical QA performance and provides a scalable approach for inference-time knowledge integration in LLMs.

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.

Paper Structure

This paper contains 39 sections, 6 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Performance comparison of different methods on medical QA benchmarks using LLaMA 8B. Our proposed method consistently outperforms baseline approaches (CoT), conventional RAG (MedRAG), and iterative RAG (i-MedRAG) across all three datasets (MedMCQA, MMLU_Med, and MedQA), demonstrating significant improvements in accuracy.
  • Figure 2: Overview of SearchRAG: Our framework first transforms complex medical questions into search-optimized synthetic queries through repeated sampling. The retrieved knowledge snippets are then filtered using uncertainty-based selection to identify the most relevant information for enhancing LLM responses.
  • Figure 3: Comparison of model accuracies for MedMCQA, MMLU_Med, and MedQA across different numbers of generated queries. Note that the 0-query baseline directly uses the original question for retrieval. As the number of generated queries increases, the performance of RAG improves significantly.