SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?
Yucheng Shi, Tianze Yang, Canyu Chen, Quanzheng Li, Tianming Liu, Xiang Li, Ninghao Liu
TL;DR
SearchRAG tackles the problem of medical QA accuracy with LLMs by introducing two key components: synthetic query generation to tailor search-engine prompts and uncertainty-based knowledge selection to filter retrieved content. The framework aligns LLMs with real-time search engines at inference time, using a formal RAG formulation and an entropy-based criterion to select informative snippets via ΔH_i = H(Y|x) − H(Y|x_i'). Empirical results across MedQA, MMLU_Med, and MedMCQA show substantial accuracy gains for 8B and 70B LLaMA variants, with notable improvements when increasing synthetic query count and applying knowledge filtering, while static sources like PubMed can occasionally degrade performance for smaller models. The work demonstrates that real-time access to current medical knowledge, combined with targeted query synthesis and uncertainty-guided filtering, meaningfully enhances medical QA performance and provides a scalable approach for inference-time knowledge integration in LLMs.
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.
