Table of Contents
Fetching ...

Learning to Plan and Generate Text with Citations

Constanza Fierro, Reinald Kim Amplayo, Fantine Huot, Nicola De Cao, Joshua Maynez, Shashi Narayan, Mirella Lapata

TL;DR

This work tackles the challenge of producing verifiable, citation-grounded responses from large language models in information-seeking settings. It introduces plan-based generation, where a blueprint—a sequence of questions—guides the content and its citations, with two variants: abstractive (questions generated) and extractive (questions copied from input). The authors demonstrate that incorporating blueprint plans improves attribution and faithfulness on the AQuAMuSe long-form QA task, with the extractive blueprint achieving the strongest overall performance and the abstractive variant showing competitive attribution on ALCE. The approach offers transferable, controllable grounding across domains and establishes a path toward more trustworthy retrieval-augmented generation in real-world search and QA systems.

Abstract

The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptualize plans as a sequence of questions which serve as blueprints of the generated content and its organization. We propose two attribution models that utilize different variants of blueprints, an abstractive model where questions are generated from scratch, and an extractive model where questions are copied from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.

Learning to Plan and Generate Text with Citations

TL;DR

This work tackles the challenge of producing verifiable, citation-grounded responses from large language models in information-seeking settings. It introduces plan-based generation, where a blueprint—a sequence of questions—guides the content and its citations, with two variants: abstractive (questions generated) and extractive (questions copied from input). The authors demonstrate that incorporating blueprint plans improves attribution and faithfulness on the AQuAMuSe long-form QA task, with the extractive blueprint achieving the strongest overall performance and the abstractive variant showing competitive attribution on ALCE. The approach offers transferable, controllable grounding across domains and establishes a path toward more trustworthy retrieval-augmented generation in real-world search and QA systems.

Abstract

The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptualize plans as a sequence of questions which serve as blueprints of the generated content and its organization. We propose two attribution models that utilize different variants of blueprints, an abstractive model where questions are generated from scratch, and an extractive model where questions are copied from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.
Paper Structure (39 sections, 1 equation, 4 figures, 17 tables)

This paper contains 39 sections, 1 equation, 4 figures, 17 tables.

Figures (4)

  • Figure 1: Query (top), followed by most relevant (abridged) passages, and summaries (bottom) with in-line citations. Summary (a) is the output of a vanilla sequence-to-sequence model trained to generate long answers with citations. Summaries (b) and (c) are the output of models with abstractive and extractive plans, respectively. Citations for plan-based models can have different formats (e.g., references to the question plan; see Section \ref{['sec:results:analysis']}).
  • Figure 2: Unique $n$-grams in generated summary.
  • Figure 3: Blueprints (top), corresponding summaries (bottom), and different citation formats: $b_i$ is a blueprint question, $s_i$ is a summary sentence and $c, Q$ are citations. Abstractive/extractive blueprints are colored in blue/purple.
  • Figure 4: Experimental instructions presented to participants during the human elicitation study. The question repeats for each sentence in the machine-generated response.