Table of Contents
Fetching ...

Improving Attributed Long-form Question Answering with Intent Awareness

Xinran Zhao, Aakanksha Naik, Jay DeYoung, Joseph Chee Chang, Jena D. Hwang, Tongshuang Wu, Varsha Kishore

Abstract

Large language models (LLMs) are increasingly being used to generate comprehensive, knowledge-intensive reports. However, while these models are trained on diverse academic papers and reports, they are not exposed to the reasoning processes and intents that guide authors in crafting these documents. We hypothesize that enhancing a model's intent awareness can significantly improve the quality of generated long-form reports. We develop and employ structured, tag-based schemes to better elicit underlying implicit intents to write or cite. We demonstrate that these extracted intents enhance both zero-shot generation capabilities in LLMs and enable the creation of high-quality synthetic data for fine-tuning smaller models. Our experiments reveal improved performance across various challenging scientific report generation tasks, with an average improvement of +2.9 and +12.3 absolute points for large and small models over baselines, respectively. Furthermore, our analysis illuminates how intent awareness enhances model citation usage and substantially improves report readability.

Improving Attributed Long-form Question Answering with Intent Awareness

Abstract

Large language models (LLMs) are increasingly being used to generate comprehensive, knowledge-intensive reports. However, while these models are trained on diverse academic papers and reports, they are not exposed to the reasoning processes and intents that guide authors in crafting these documents. We hypothesize that enhancing a model's intent awareness can significantly improve the quality of generated long-form reports. We develop and employ structured, tag-based schemes to better elicit underlying implicit intents to write or cite. We demonstrate that these extracted intents enhance both zero-shot generation capabilities in LLMs and enable the creation of high-quality synthetic data for fine-tuning smaller models. Our experiments reveal improved performance across various challenging scientific report generation tasks, with an average improvement of +2.9 and +12.3 absolute points for large and small models over baselines, respectively. Furthermore, our analysis illuminates how intent awareness enhances model citation usage and substantially improves report readability.

Paper Structure

This paper contains 36 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Current long-form question answering systems don't consider intents when generating responses. The figure above shows how having explicit citation intents and paragraph intents helps reason about the text and generate better responses.
  • Figure 2: (left) average portion of retrieved candidates used in the generated reports; (right) average citation coverage between small model variants and gemini-2.5-pro. All average scores are computed at a query level. default and verb. intent denotes the different instructions. verb. intent denotes the augmentation of intent awareness. The analysis is done on SQA-CS-V2.
  • Figure 3: A screenshot of the instructions to the users. Besides the tasks shown in the figure, the users will also be provided with a step-by-step guide on the annotation tasks and the key points to remember. The instructions can be revisited during the annotation task by the users by clicking a "Click to Expand Instruction" button.
  • Figure 4: A screenshot of the PIT questions for the baseline system. Users are shown with the section titles and paragraph first sentences to answer the questions.
  • Figure 5: A screenshot of the CIT question for the baseline system. Users are shown with a specific paragraph with one highlighted citation to answer the question. The snippet will show as the users move their mouse over it.
  • ...and 2 more figures