Pay-Per-Search Models are Abstention Models

Mustafa Omer Gul; Claire Cardie; Tanya Goyal

Pay-Per-Search Models are Abstention Models

Mustafa Omer Gul, Claire Cardie, Tanya Goyal

TL;DR

The paper tackles the challenge of enabling AI systems to recognize and act upon their knowledge boundaries. It introduces MASH, a reinforcement-learning framework that trains LLMs to selectively seek external retrievals, with a pay-per-search penalty that balances accuracy and tool use. By evaluating both with and without search access, the study demonstrates that MASH yields strong knowledge-boundary abstention behavior while expanding answerable questions via retrieval. The approach achieves notable gains on multi-hop QA tasks, shows off-the-shelf abstention capabilities, and generalizes reasonably to out-of-distribution data, without relying on predefined boundaries for training. Overall, MASH provides a practical, end-to-end method to align external-help use with parametric knowledge and abstention decisions in LLMs.

Abstract

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

Pay-Per-Search Models are Abstention Models

TL;DR

Abstract

Pay-Per-Search Models are Abstention Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)