Table of Contents
Fetching ...

Distilling Opinions at Scale: Incremental Opinion Summarization using XL-OPSUMM

Sri Raghava Muddu, Rupasai Rangaraju, Tejpalsingh Siledar, Swaroop Nath, Pushpak Bhattacharyya, Swaprava Nath, Suman Banerjee, Amey Patil, Muthusamy Chelliah, Sudhanshu Shekhar Singh, Nikesh Garera

TL;DR

XL-OpSumm tackles the scalability challenge of opinion summarization for thousands of reviews by introducing an incremental, chunk-based framework that maintains a Global Summary and an Aspect Dictionary while updating summaries via aspect-aware per-chunk processing. The approach uses non-overlapping chunks of up to $ au$ tokens, updates aspect sentiments with ABSA, and leverages LLMs to generate per-chunk Local Summaries before merging into a final Global Summary, enabling context-free growth beyond typical token limits. A new large-scale Xl-Flipkart test set (~3,680 reviews/product across 25 products) and the existing AMASUM dataset demonstrate substantial gains, with XL-OpSumm achieving ROUGE-1 F1 gains of 4.38% and ROUGE-L F1 gains of 3.70% over close baselines on average, and strong positive results in reference-free metrics such as BooookScore, fluency, and coherence. The work highlights practical impact for live e-commerce platforms by enabling continuous, scalable opinion synthesis and points to future directions like incorporating additional data sources (Q&A, product descriptions) and addressing remaining evaluation limitations.

Abstract

Opinion summarization in e-commerce encapsulates the collective views of numerous users about a product based on their reviews. Typically, a product on an e-commerce platform has thousands of reviews, each review comprising around 10-15 words. While Large Language Models (LLMs) have shown proficiency in summarization tasks, they struggle to handle such a large volume of reviews due to context limitations. To mitigate, we propose a scalable framework called Xl-OpSumm that generates summaries incrementally. However, the existing test set, AMASUM has only 560 reviews per product on average. Due to the lack of a test set with thousands of reviews, we created a new test set called Xl-Flipkart by gathering data from the Flipkart website and generating summaries using GPT-4. Through various automatic evaluations and extensive analysis, we evaluated the framework's efficiency on two datasets, AMASUM and Xl-Flipkart. Experimental results show that our framework, Xl-OpSumm powered by Llama-3-8B-8k, achieves an average ROUGE-1 F1 gain of 4.38% and a ROUGE-L F1 gain of 3.70% over the next best-performing model.

Distilling Opinions at Scale: Incremental Opinion Summarization using XL-OPSUMM

TL;DR

XL-OpSumm tackles the scalability challenge of opinion summarization for thousands of reviews by introducing an incremental, chunk-based framework that maintains a Global Summary and an Aspect Dictionary while updating summaries via aspect-aware per-chunk processing. The approach uses non-overlapping chunks of up to tokens, updates aspect sentiments with ABSA, and leverages LLMs to generate per-chunk Local Summaries before merging into a final Global Summary, enabling context-free growth beyond typical token limits. A new large-scale Xl-Flipkart test set (~3,680 reviews/product across 25 products) and the existing AMASUM dataset demonstrate substantial gains, with XL-OpSumm achieving ROUGE-1 F1 gains of 4.38% and ROUGE-L F1 gains of 3.70% over close baselines on average, and strong positive results in reference-free metrics such as BooookScore, fluency, and coherence. The work highlights practical impact for live e-commerce platforms by enabling continuous, scalable opinion synthesis and points to future directions like incorporating additional data sources (Q&A, product descriptions) and addressing remaining evaluation limitations.

Abstract

Opinion summarization in e-commerce encapsulates the collective views of numerous users about a product based on their reviews. Typically, a product on an e-commerce platform has thousands of reviews, each review comprising around 10-15 words. While Large Language Models (LLMs) have shown proficiency in summarization tasks, they struggle to handle such a large volume of reviews due to context limitations. To mitigate, we propose a scalable framework called Xl-OpSumm that generates summaries incrementally. However, the existing test set, AMASUM has only 560 reviews per product on average. Due to the lack of a test set with thousands of reviews, we created a new test set called Xl-Flipkart by gathering data from the Flipkart website and generating summaries using GPT-4. Through various automatic evaluations and extensive analysis, we evaluated the framework's efficiency on two datasets, AMASUM and Xl-Flipkart. Experimental results show that our framework, Xl-OpSumm powered by Llama-3-8B-8k, achieves an average ROUGE-1 F1 gain of 4.38% and a ROUGE-L F1 gain of 3.70% over the next best-performing model.
Paper Structure (20 sections, 1 figure, 7 tables)

This paper contains 20 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Illustration of our Xl-OpSumm framework. First, reviews are divided into non-overlapping chunks based on threshold. Then for each chunk, the Aspect dictionary is updated, the Local Summary is generated and the Global Summary is updated as shown above. Refer to the section \ref{['sec: framework']} for more details about this framework