Table of Contents
Fetching ...

Analysis of Plan-based Retrieval for Grounded Text Generation

Ameya Godbole, Nicholas Monath, Seungyeon Kim, Ankit Singh Rawat, Andrew McCallum, Manzil Zaheer

TL;DR

The paper addresses hallucinations in text generation by introducing plan-based retrieval, where an LLM first creates a paragraph-level plan, generates targeted search queries, retrieves supporting documents, and then writes the final text conditioned on the plan, queries, and retrieved evidence. It compares direct generation, single-round retrieval, and plan-based retrieval across head and tail entities and current events, demonstrating that planning-guided retrieval improves source attribution (AIS) and grounding, often with longer, more informative outputs. Two plan variants (Var.A and Var.B) and a second retrieval pass are analyzed, and the approach generalizes to open-weight models like Mistral-7B-Instruct with model-specific tuning. The work highlights how structured planning can enhance retrieval effectiveness for grounded, long-form generation and informs future designs for attribution-aware LLM systems.

Abstract

In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametric knowledge (due to rarity, recency, domain, etc.). A common strategy to address this limitation is to infuse the language models with retrieval mechanisms, providing the model with relevant knowledge for the task. In this paper, we leverage the planning capabilities of instruction-tuned LLMs and analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations. We empirically evaluate several variations of our proposed approach on long-form text generation tasks. By improving the coverage of relevant facts, plan-guided retrieval and generation can produce more informative responses while providing a higher rate of attribution to source documents.

Analysis of Plan-based Retrieval for Grounded Text Generation

TL;DR

The paper addresses hallucinations in text generation by introducing plan-based retrieval, where an LLM first creates a paragraph-level plan, generates targeted search queries, retrieves supporting documents, and then writes the final text conditioned on the plan, queries, and retrieved evidence. It compares direct generation, single-round retrieval, and plan-based retrieval across head and tail entities and current events, demonstrating that planning-guided retrieval improves source attribution (AIS) and grounding, often with longer, more informative outputs. Two plan variants (Var.A and Var.B) and a second retrieval pass are analyzed, and the approach generalizes to open-weight models like Mistral-7B-Instruct with model-specific tuning. The work highlights how structured planning can enhance retrieval effectiveness for grounded, long-form generation and informs future designs for attribution-aware LLM systems.

Abstract

In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametric knowledge (due to rarity, recency, domain, etc.). A common strategy to address this limitation is to infuse the language models with retrieval mechanisms, providing the model with relevant knowledge for the task. In this paper, we leverage the planning capabilities of instruction-tuned LLMs and analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations. We empirically evaluate several variations of our proposed approach on long-form text generation tasks. By improving the coverage of relevant facts, plan-guided retrieval and generation can produce more informative responses while providing a higher rate of attribution to source documents.
Paper Structure (21 sections, 2 figures, 12 tables)

This paper contains 21 sections, 2 figures, 12 tables.

Figures (2)

  • Figure 1: Summary of Planning and Retrieval used to generate text. Given an initial prompt, a plan is first generated that outlines the segments to be written. Next, search queries are generated for each segment which are then used for fine-grained retrieval retrieval of source documents. The final response is generated conditioned on the plan, the queries and the retrieved documents.
  • Figure 2: Example Generation. One of the hallucinations in the One-Retrieval model is the focus of one of the questions provided in the question-based plan.