Table of Contents
Fetching ...

AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning

Tzu-Han Lin, Wei-Lin Chen, Chen-An Li, Hung-yi Lee, Yun-Nung Chen, Yu Meng

TL;DR

<3-5 sentence high-level summary> This paper addresses the problem of how to balance internal parametric knowledge with external search in large language models to reduce cost and risk. It introduces AdaSearch, a two-stage outcome-driven reinforcement learning framework that separates problem solving from the decision to search and provides explicit decision rationales. The approach yields stronger self-knowledge awareness, reduces unnecessary search calls, and preserves task performance across model families and retrieval setups, outperforming several reward-shaping baselines and end-to-end alternatives. The work contributes an interpretable decision-making component suitable for high-stakes domains and demonstrates broad generalization to different model sizes and retrievers.</3-5 sentence high-level summary>

Abstract

Equipping large language models (LLMs) with search engines via reinforcement learning (RL) has emerged as an effective approach for building search agents. However, overreliance on search introduces unnecessary cost and risks exposure to noisy or malicious content, while relying solely on parametric knowledge risks hallucination. The central challenge is to develop agents that adaptively balance parametric knowledge with external search, invoking search only when necessary. Prior work mitigates search overuse by shaping rewards around the number of tool calls. However, these penalties require substantial reward engineering, provide ambiguous credit assignment, and can be exploited by agents that superficially reduce calls. Moreover, evaluating performance solely through call counts conflates necessary and unnecessary search, obscuring the measurement of true adaptive behavior. To address these limitations, we first quantify the self-knowledge awareness of existing search agents via an F1-based decision metric, revealing that methods such as Search-R1 often overlook readily available parametric knowledge. Motivated by these findings, we propose AdaSearch, a simple two-stage, outcome-driven RL framework that disentangles problem solving from the decision of whether to invoke search, and makes this decision process explicit and interpretable. This transparency is crucial for high-stakes domains such as finance and medical question answering, yet is largely neglected by prior approaches. Experiments across multiple model families and sizes demonstrate that AdaSearch substantially improves knowledge-boundary awareness, reduces unnecessary search calls, preserves strong task performance, and offers more transparent, interpretable decision behaviors.

AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning

TL;DR

<3-5 sentence high-level summary> This paper addresses the problem of how to balance internal parametric knowledge with external search in large language models to reduce cost and risk. It introduces AdaSearch, a two-stage outcome-driven reinforcement learning framework that separates problem solving from the decision to search and provides explicit decision rationales. The approach yields stronger self-knowledge awareness, reduces unnecessary search calls, and preserves task performance across model families and retrieval setups, outperforming several reward-shaping baselines and end-to-end alternatives. The work contributes an interpretable decision-making component suitable for high-stakes domains and demonstrates broad generalization to different model sizes and retrievers.</3-5 sentence high-level summary>

Abstract

Equipping large language models (LLMs) with search engines via reinforcement learning (RL) has emerged as an effective approach for building search agents. However, overreliance on search introduces unnecessary cost and risks exposure to noisy or malicious content, while relying solely on parametric knowledge risks hallucination. The central challenge is to develop agents that adaptively balance parametric knowledge with external search, invoking search only when necessary. Prior work mitigates search overuse by shaping rewards around the number of tool calls. However, these penalties require substantial reward engineering, provide ambiguous credit assignment, and can be exploited by agents that superficially reduce calls. Moreover, evaluating performance solely through call counts conflates necessary and unnecessary search, obscuring the measurement of true adaptive behavior. To address these limitations, we first quantify the self-knowledge awareness of existing search agents via an F1-based decision metric, revealing that methods such as Search-R1 often overlook readily available parametric knowledge. Motivated by these findings, we propose AdaSearch, a simple two-stage, outcome-driven RL framework that disentangles problem solving from the decision of whether to invoke search, and makes this decision process explicit and interpretable. This transparency is crucial for high-stakes domains such as finance and medical question answering, yet is largely neglected by prior approaches. Experiments across multiple model families and sizes demonstrate that AdaSearch substantially improves knowledge-boundary awareness, reduces unnecessary search calls, preserves strong task performance, and offers more transparent, interpretable decision behaviors.

Paper Structure

This paper contains 53 sections, 7 equations, 6 figures, 43 tables, 2 algorithms.

Figures (6)

  • Figure 1: Comparison of RL methods for search agents. Left: AdaSearch provides transparent and interpretable decisions via explicit reasoning. Conversely, Search-R1 overuses search even when parametric knowledge suffices, while RL with search penalties results in underuse (leading to hallucinations) where the decision rationale remains implicit. Right: AdaSearch achieves the best overall self-knowledge awareness while preserving task performance. In contrast, Search-R1 achieves zero self-knowledge awareness due to its always-search behavior, and reward-shaping methods fail to maintain QA performance.
  • Figure 2: Overview of our proposed AdaSearch framework. In stage 1, the agent explicitly reasons to decide whether the query can be solved using parametric knowledge. In stage 2, it follows the parametric-knowledge prompt if the knowledge is sufficient; otherwise, it switches to the search prompt to interleave reasoning and search for the final answer.
  • Figure 3: Analysis on self-knowledge awareness. (a) Confusion matrix of different RL methods. (b) Averaged $\mathbf{EM}$, $\mathbf{F1}_{\text{aware}}$, precision (Prec.), and recall (Rec.) across benchmarks.
  • Figure 4: Averaged performance and training rewards across stages. (a) Stage 1 substantially improves problem solving, and Stage 2 does not degrade it. (b) Both test $\mathbf{EM}$ and self-knowledge awareness ($\mathbf{F1}_{\text{aware}}$) improve throughout training. (c) (d) Stage 1 and Stage 2 respectively incentivize problem-solving and self-knowledge awareness on the training set.
  • Figure 5: Ablations of AdaSearch.
  • ...and 1 more figures