STRUM-LLM: Attributed and Structured Contrastive Summarization

Beliz Gunel; James B. Wendt; Jing Xie; Yichao Zhou; Nguyen Vo; Zachary Fisher; Sandeep Tata

STRUM-LLM: Attributed and Structured Contrastive Summarization

Beliz Gunel, James B. Wendt, Jing Xie, Yichao Zhou, Nguyen Vo, Zachary Fisher, Sandeep Tata

TL;DR

STRUM-LLM tackles decision-support for A vs B comparisons by producing attributed, structured, grounded contrastive summaries. It uses a modular LLM-based pipeline (LM-Extract, LM-Attribute-Merge, LM-Value-Merge, LM-Contrast, LM-Usefulness) with critique-and-revision stages and long-input tiling to handle arbitrary source text without supervision. The approach enforces attribution to sources, non-redundant attribute clustering, and ranking of high-contrast attributes, validated by row- and summary-level metrics that correlate with human judgments. Empirically, STRUM-LLM Distilled achieves about 100x throughput and 10x smaller size with competitive accuracy, outperforming the STRUM-in-code baseline and approaching Few-shot STRUM-LLM, demonstrating practical deployment viability for real-world decision-support systems and outlining avenues for multimodal extensions.

Abstract

Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages. We propose STRUM-LLM that addresses this challenge by generating attributed, structured, and helpful contrastive summaries that highlight key differences between the two options. STRUM-LLM identifies helpful contrast: the specific attributes along which the two options differ significantly and which are most likely to influence the user's decision. Our technique is domain-agnostic, and does not require any human-labeled data or fixed attribute list as supervision. STRUM-LLM attributes all extractions back to the input sources along with textual evidence, and it does not have a limit on the length of input sources that it can process. STRUM-LLM Distilled has 100x more throughput than the models with comparable performance while being 10x smaller. In this paper, we provide extensive evaluations for our method and lay out future directions for our currently deployed system.

STRUM-LLM: Attributed and Structured Contrastive Summarization

TL;DR

Abstract

Paper Structure (15 sections, 5 figures, 4 tables)

This paper contains 15 sections, 5 figures, 4 tables.

Introduction
Related Work
Method
Desiderata for a Helpful Comparison
STRUM-LLM
Critique-and-Revision Models for STRUM-LLM
Evaluation Setup
STRUM In-Code Baseline
Row-Level Comparison Helpfulness Evaluations
Summary-Level Evaluations
Results
Conclusion and Future Work
Ethical Considerations
Row-Level Comparison Helpfulness Evaluations
STRUM-LLM Output Summaries

Figures (5)

Figure 1: STRUM-LLM aims to produce an attributed (grounded in the input sources), faceted (a row per attribute), and helpful (relevant and contrastive attributes) summary for an A vs B comparison.
Figure 2: STRUM-LLM retrieves web pages relevant to the entities being compared, divides them into chunks of text that fit into the context window of the LLM that we refer to as tiling, extracts attributes and values from the text, clusters related attributes and merges their values, and identifies the most meaningful contrast between the two entities. Critique-and-revision (CR) models improve the quality for both LM-Extract and LM-Compare during data generation.
Figure 3: STRUM-LLM summary comparing Sonos Move and Apple HomePod.
Figure 4: STRUM-LLM summary comparing Sony LinkBuds and Bose SoundSport Free.
Figure 5: STRUM-LLM summary comparing 1zpresso JX Pro and Knock Aergrind.

STRUM-LLM: Attributed and Structured Contrastive Summarization

TL;DR

Abstract

STRUM-LLM: Attributed and Structured Contrastive Summarization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)