Noisy Pairing and Partial Supervision for Stylized Opinion Summarization
Hayate Iso, Xiaolan Wang, Yoshi Suhara
TL;DR
This work defines stylized opinion summarization and introduces Napa, a non-parallel training framework that combines Noisy Pairing and Partial Supervision to generate professionally styled summaries from customer reviews. It constructs the ProSum benchmark by pairing Yelp customer reviews with Michelin professional reviews, and demonstrates that Napa substantially outperforms self-supervised and non-parallel baselines on ProSum and FewSum, while closely approaching supervised upper bounds. The approach relies on creating noisy cross-entity input-output pairs and constraining learning to aligned subsequences via token alignment, with self-supervised pre-training providing foundational summarization capability. The results suggest Napa enables practical stylized summarization in settings where parallel reviews-summary data are scarce, though limitations such as potential hallucinations and alignment errors remain areas for further work.
Abstract
Opinion summarization research has primarily focused on generating summaries reflecting important opinions from customer reviews without paying much attention to the writing style. In this paper, we propose the stylized opinion summarization task, which aims to generate a summary of customer reviews in the desired (e.g., professional) writing style. To tackle the difficulty in collecting customer and professional review pairs, we develop a non-parallel training framework, Noisy Pairing and Partial Supervision (NAPA), which trains a stylized opinion summarization system from non-parallel customer and professional review sets. We create a benchmark ProSum by collecting customer and professional reviews from Yelp and Michelin. Experimental results on ProSum and FewSum demonstrate that our non-parallel training framework consistently improves both automatic and human evaluations, successfully building a stylized opinion summarization model that can generate professionally-written summaries from customer reviews. The code is available at https://github.com/megagonlabs/napa
