The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations

Theodora Worledge; Tatsunori Hashimoto; Carlos Guestrin

The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations

Theodora Worledge, Tatsunori Hashimoto, Carlos Guestrin

TL;DR

The extractive-abstractive spectrum is introduced, in which search engines and LLMs are extreme endpoints encapsulating multiple unexplored intermediate operating points and five operating points are defined that span the extractive-abstractive spectrum are defined.

Abstract

Across all fields of academic study, experts cite their sources when sharing information. While large language models (LLMs) excel at synthesizing information, they do not provide reliable citation to sources, making it difficult to trace and verify the origins of the information they present. In contrast, search engines make sources readily accessible to users and place the burden of synthesizing information on the user. Through a survey, we find that users prefer search engines over LLMs for high-stakes queries, where concerns regarding information provenance outweigh the perceived utility of LLM responses. To examine the interplay between verifiability and utility of information-sharing tools, we introduce the extractive-abstractive spectrum, in which search engines and LLMs are extreme endpoints encapsulating multiple unexplored intermediate operating points. Search engines are extractive because they respond to queries with snippets of sources with links (citations) to the original webpages. LLMs are abstractive because they address queries with answers that synthesize and logically transform relevant information from training and in-context sources without reliable citation. We define five operating points that span the extractive-abstractive spectrum and conduct human evaluations on seven systems across four diverse query distributions that reflect real-world QA settings: web search, language simplification, multi-step reasoning, and medical advice. As outputs become more abstractive, we find that perceived utility improves by as much as 200%, while the proportion of properly cited sentences decreases by as much as 50% and users take up to 3 times as long to verify cited information. Our findings recommend distinct operating points for domain-specific LLM systems and our failure analysis informs approaches to high-utility LLM systems that empower users to verify information.

The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations

TL;DR

Abstract

The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)