Table of Contents
Fetching ...

Pay-Per-Search Models are Abstention Models

Mustafa Omer Gul, Claire Cardie, Tanya Goyal

TL;DR

The paper tackles the challenge of enabling AI systems to recognize and act upon their knowledge boundaries. It introduces MASH, a reinforcement-learning framework that trains LLMs to selectively seek external retrievals, with a pay-per-search penalty that balances accuracy and tool use. By evaluating both with and without search access, the study demonstrates that MASH yields strong knowledge-boundary abstention behavior while expanding answerable questions via retrieval. The approach achieves notable gains on multi-hop QA tasks, shows off-the-shelf abstention capabilities, and generalizes reasonably to out-of-distribution data, without relying on predefined boundaries for training. Overall, MASH provides a practical, end-to-end method to align external-help use with parametric knowledge and abstention decisions in LLMs.

Abstract

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

Pay-Per-Search Models are Abstention Models

TL;DR

The paper tackles the challenge of enabling AI systems to recognize and act upon their knowledge boundaries. It introduces MASH, a reinforcement-learning framework that trains LLMs to selectively seek external retrievals, with a pay-per-search penalty that balances accuracy and tool use. By evaluating both with and without search access, the study demonstrates that MASH yields strong knowledge-boundary abstention behavior while expanding answerable questions via retrieval. The approach achieves notable gains on multi-hop QA tasks, shows off-the-shelf abstention capabilities, and generalizes reasonably to out-of-distribution data, without relying on predefined boundaries for training. Overall, MASH provides a practical, end-to-end method to align external-help use with parametric knowledge and abstention decisions in LLMs.

Abstract

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

Paper Structure

This paper contains 45 sections, 2 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of MASH's strategy for eliciting abstractions. Help-seeking LLMs are RL-trained to maximize answer accuracy while minimizing the searches. At inference, this same model is used for abstention by removing search access and treating any search requests as abstention.
  • Figure 2:
  • Figure 3: The input prompt used during R1 training experiments. The final $<$question$>$ is replaced by the input question.
  • Figure 4: The input prompt used during search tool use experiments. The final $<$question$>$ is replaced by the input question.
  • Figure 5: The input prompt used when generating tool-use trajectories during warm-start data generation. The final $<$question$>$ is replaced by the input question.
  • ...and 4 more figures