Textual Summarisation of Large Sets: Towards a General Approach
Kittipitch Kuptavanich, Ehud Reiter, Kees Van Deemter, Advaith Siddharthan
TL;DR
This paper extends a rule-based natural language generation approach for summarising large sets from consumer products to bibliographical references in academic papers. It introduces the refSet algorithm, which omits the dominating citation-count attribute, uses quantifiers and ranges, and highlights top-per-venue publications and authors to support readers in decision-making tasks. An evaluation with human participants shows that refSet improves usefulness in scenario-based decisions (e.g., reviewing or browsing) and generally outperforms a semantic Scholar baseline, with strong statistical support in one scenario and partial support in another. The work demonstrates that set-summarisation techniques can generalise across domains, while also outlining future directions for domain-agnostic and data-analysis-driven summarisation when corpus data are unavailable.
Abstract
We are developing techniques to generate summary descriptions of sets of objects. In this paper, we present and evaluate a rule-based NLG technique for summarising sets of bibliographical references in academic papers. This extends our previous work on summarising sets of consumer products and shows how our model generalises across these two very different domains.
