Table of Contents
Fetching ...

Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning

Yuqin Dai, Shuo Yang, Guoqing Wang, Yong Deng, Zhanwei Zhang, Jun Yin, Pengyu Zeng, Zhenzhe Ying, Changhua Meng, Can Yi, Yuchen Zhou, Weiqiang Wang, Shuai Lu

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating up-to-date external knowledge, yet real-world web environments present unique challenges. These limitations manifest as two key challenges: pervasive misinformation in the web environment, which introduces unreliable or misleading content that can degrade retrieval accuracy, and the underutilization of web tools, which, if effectively employed, could enhance query precision and help mitigate this noise, ultimately improving the retrieval results in RAG systems. To address these issues, we propose WebFilter, a novel RAG framework that generates source-restricted queries and filters out unreliable content. This approach combines a retrieval filtering mechanism with a behavior- and outcome-driven reward strategy, optimizing both query formulation and retrieval outcomes. Extensive experiments demonstrate that WebFilter improves answer quality and retrieval precision, outperforming existing RAG methods on both in-domain and out-of-domain benchmarks.

Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating up-to-date external knowledge, yet real-world web environments present unique challenges. These limitations manifest as two key challenges: pervasive misinformation in the web environment, which introduces unreliable or misleading content that can degrade retrieval accuracy, and the underutilization of web tools, which, if effectively employed, could enhance query precision and help mitigate this noise, ultimately improving the retrieval results in RAG systems. To address these issues, we propose WebFilter, a novel RAG framework that generates source-restricted queries and filters out unreliable content. This approach combines a retrieval filtering mechanism with a behavior- and outcome-driven reward strategy, optimizing both query formulation and retrieval outcomes. Extensive experiments demonstrate that WebFilter improves answer quality and retrieval precision, outperforming existing RAG methods on both in-domain and out-of-domain benchmarks.

Paper Structure

This paper contains 25 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of WebFilter with Existing Methods: Existing methods deepresearchersong2025r1 often yield unreliable results in misinformation-rich web environments. WebFilter enhances accuracy by using advanced search operators to filter noise and retrieve target files.
  • Figure 2: Overview of the WebFilter training framework. Upper: We formulate retrieval as a Markov Decision Process, where the model interacts with web search tools through step-by-step actions, including query generation and evidence selection. Middle: To improve tool usage, we provide explicit instructions and demonstrations on how to issue effective, source-aware queries. Lower: The policy is optimized using a behavior- and outcome-driven Information-Filtering Reward strategy, which encourages both proper tool invocation and high-quality information retrieval.
  • Figure 3: Frequency of advanced operators across variants.
  • Figure 4: Training dynamics showing (a) QA accuracy ($ACC_R$), (b) tool call behavior, and (c) response length evolution across training steps.
  • Figure 5: Case studies showing how WebFilter improves QA by (a) narrowing searches to authoritative sources for precise results, (b) verifying ambiguous or conflicting information via trusted sites, and (c) adaptively refining search queries when initial attempts are insufficient.