CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language Models
Yong-En Tian, Yu-Chien Tang, An-Zi Yen, Wen-Chih Peng
TL;DR
CARPAS tackles the challenge of incomplete or irrelevant provided aspects in aspect-based summarization by introducing a content-aware refinement framework. It demonstrates that prompting LLMs alone yields overly broad aspect sets and proposes predicting the number of relevant aspects as a guiding signal, implemented via a regression-based #Aspect-RM model trained on synthetic data. Across synthetic and real datasets (ECT, COVID-19-PC, RW-ECT) and multiple prompting strategies, CARPAS achieves consistent improvements in summary quality (BERTScore and ROUGE-L) and reduces aspect-count errors, revealing the value of explicit aspect-count guidance. The work provides insights into LLM deployment for ABS and sets the stage for extending aspect refinement to other structured data and dialogue systems.
Abstract
Aspect-based summarization has attracted significant attention for its ability to generate more fine-grained and user-aligned summaries. While most existing approaches assume a set of predefined aspects as input, real-world scenarios often present challenges where these given aspects may be incomplete, irrelevant, or entirely missing from the document. Users frequently expect systems to adaptively refine or filter the provided aspects based on the actual content. In this paper, we initiate this novel task setting, termed Content-Aware Refinement of Provided Aspects for Summarization (CARPAS), with the aim of dynamically adjusting the provided aspects based on the document context before summarizing. We construct three new datasets to facilitate our pilot experiments, and by using LLMs with four representative prompting strategies in this task, we find that LLMs tend to predict an overly comprehensive set of aspects, which often results in excessively long and misaligned summaries. Building on this observation, we propose a preliminary subtask to predict the number of relevant aspects, and demonstrate that the predicted number can serve as effective guidance for the LLMs, reducing the inference difficulty, and enabling them to focus on the most pertinent aspects. Our extensive experiments show that the proposed approach significantly improves performance across all datasets. Moreover, our deeper analyses uncover LLMs' compliance when the requested number of aspects differs from their own estimations, establishing a crucial insight for the deployment of LLMs in similar real-world applications.
