Table of Contents
Fetching ...

RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

Deniz Qian, Hung-Ting Chen, Eunsol Choi

TL;DR

This work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI).

Abstract

Comprehensively retrieving diverse documents is crucial to address queries that admit a wide range of valid answers. We introduce retrieve-verify-retrieve (RVR), a multi-round retrieval framework designed to maximize answer coverage. Initially, a retriever takes the original query and returns a candidate document set, followed by a verifier that identifies a high-quality subset. For subsequent rounds, the query is augmented with previously verified documents to uncover answers that are not yet covered in previous rounds. RVR is effective even with off-the-shelf retrievers, and fine-tuning retrievers for our inference procedure brings further gains. Our method outperforms baselines, including agentic search approaches, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI). We also see consistent gains on two out-of-domain datasets (QUEST and WebQuestionsSP) across different base retrievers. Our work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario.

RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

TL;DR

This work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI).

Abstract

Comprehensively retrieving diverse documents is crucial to address queries that admit a wide range of valid answers. We introduce retrieve-verify-retrieve (RVR), a multi-round retrieval framework designed to maximize answer coverage. Initially, a retriever takes the original query and returns a candidate document set, followed by a verifier that identifies a high-quality subset. For subsequent rounds, the query is augmented with previously verified documents to uncover answers that are not yet covered in previous rounds. RVR is effective even with off-the-shelf retrievers, and fine-tuning retrievers for our inference procedure brings further gains. Our method outperforms baselines, including agentic search approaches, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI). We also see consistent gains on two out-of-domain datasets (QUEST and WebQuestionsSP) across different base retrievers. Our work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario.
Paper Structure (62 sections, 1 equation, 12 figures, 12 tables, 1 algorithm)

This paper contains 62 sections, 1 equation, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of our Retrieve-Verify-Retrieve framework. Each query $q$ aims to retrieve documents to get multiple answers $(y_1, y_2,y_3)$. The initial retriever takes a query and returns document sets, and the verifier examines each document, identifying two valid answers $y_1, y_2$. The subsequent retriever takes the query and documents containing identified answers as input, targeting to retrieve complementary answer $y_3$.
  • Figure 2: Multi-turn Generalization Results. This figure illustrates the change in Recall@100 and MRecall@100 across five iterations with a verifier budget of 100. Left panels show results with LLM verifier (Qwen3-30B), while right panels show results with oracle verifier that selects documents containing unique answer strings to be used as input. Performance with the LLM verifier plateaus after the second iteration, whereas the oracle verifier shows continued improvement, indicating substantial headroom for better verification mechanisms.
  • Figure 3: Varying the verifier budget. We evaluate MR@100 across three new verifier budgets on QAMPARI dataset. RVR is shown in green and blue, while our one-round baseline in red. See Appendix \ref{['appendix:varying-verifier-budget-precision-recall']} for Recall@100 results.
  • Figure 4: The performance (MRecall@100) with varying number of input documents at inference time (Context Budget $M$). We compare models fine-tuned with different maximum document counts (3, 6, and 12 docs) for INF and Qwen3. Different colors denote fine-tuning with different number of documents, and shapes indicate the retrievers used. Other metric results (Recall@100) are provided in Appendix \ref{['appendix:varying-input']}.
  • Figure 5: This figure shows the results of lowering verifier budget beyond 100 for Recall@100.
  • ...and 7 more figures