Table of Contents
Fetching ...

LFOSum: Summarizing Long-form Opinions with Large Language Models

Mir Tafseer Nayeem, Davood Rafiei

TL;DR

A new dataset of long-form user reviews, each entity comprising over a thousand reviews, two training-free LLM-based summarization approaches that scale to long inputs, and automatic evaluation metrics that provide a more granular, context-sensitive assessment of summary faithfulness are introduced.

Abstract

Online reviews play a pivotal role in influencing consumer decisions across various domains, from purchasing products to selecting hotels or restaurants. However, the sheer volume of reviews -- often containing repetitive or irrelevant content -- leads to information overload, making it challenging for users to extract meaningful insights. Traditional opinion summarization models face challenges in handling long inputs and large volumes of reviews, while newer Large Language Model (LLM) approaches often fail to generate accurate and faithful summaries. To address those challenges, this paper introduces (1) a new dataset of long-form user reviews, each entity comprising over a thousand reviews, (2) two training-free LLM-based summarization approaches that scale to long inputs, and (3) automatic evaluation metrics. Our dataset of user reviews is paired with in-depth and unbiased critical summaries by domain experts, serving as a reference for evaluation. Additionally, our novel reference-free evaluation metrics provide a more granular, context-sensitive assessment of summary faithfulness. We benchmark several open-source and closed-source LLMs using our methods. Our evaluation reveals that LLMs still face challenges in balancing sentiment and format adherence in long-form summaries, though open-source models can narrow the gap when relevant information is retrieved in a focused manner.

LFOSum: Summarizing Long-form Opinions with Large Language Models

TL;DR

A new dataset of long-form user reviews, each entity comprising over a thousand reviews, two training-free LLM-based summarization approaches that scale to long inputs, and automatic evaluation metrics that provide a more granular, context-sensitive assessment of summary faithfulness are introduced.

Abstract

Online reviews play a pivotal role in influencing consumer decisions across various domains, from purchasing products to selecting hotels or restaurants. However, the sheer volume of reviews -- often containing repetitive or irrelevant content -- leads to information overload, making it challenging for users to extract meaningful insights. Traditional opinion summarization models face challenges in handling long inputs and large volumes of reviews, while newer Large Language Model (LLM) approaches often fail to generate accurate and faithful summaries. To address those challenges, this paper introduces (1) a new dataset of long-form user reviews, each entity comprising over a thousand reviews, (2) two training-free LLM-based summarization approaches that scale to long inputs, and (3) automatic evaluation metrics. Our dataset of user reviews is paired with in-depth and unbiased critical summaries by domain experts, serving as a reference for evaluation. Additionally, our novel reference-free evaluation metrics provide a more granular, context-sensitive assessment of summary faithfulness. We benchmark several open-source and closed-source LLMs using our methods. Our evaluation reveals that LLMs still face challenges in balancing sentiment and format adherence in long-form summaries, though open-source models can narrow the gap when relevant information is retrieved in a focused manner.

Paper Structure

This paper contains 52 sections, 3 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Our LFOSum framework includes two methods: (1) Long-form Critic, which uses long-context LLMs to generate critic summaries with user controls for sentiment and length (§\ref{['sec:long-form-critic']}), and (2) the RAG Framework, which combines retrieval augmentation with LLMs to handle long-form user reviews and produce summaries (§\ref{['sec:RAG-framework']}).
  • Figure 2: A sample example from our dataset. Hampton Inn Tropicana (https://www.oyster.com/las-vegas/hotels/hampton-inn-tropicana/)
  • Figure 3: Long-form Critic Summarization Prompt.
  • Figure 4: LLM as a Reranker Prompt.
  • Figure 5: LLM as an Abstractor Prompt.