Table of Contents
Fetching ...

Topic-Controllable Summarization: Topic-Aware Evaluation and Transformer Methods

Tatiana Passali, Grigorios Tsoumakas

TL;DR

This work tackles topic-controllable summarization by (1) introducing STAS, a topic-aware evaluation metric based on cosine similarity between topic and summary representations, normalised by the dominant topic, and (2) adapting topic control to Transformer architectures via two strategies: topic embeddings and control tokens. It demonstrates that control tokens, especially when combined with prepending and tagging, yield higher topic alignment and faster inference than embedding-based methods, including in zero-shot settings. A synthetic topic-oriented CNN/DailyMail dataset is released to train and evaluate models, and STAS is validated against human judgments with high correlation. The results show substantial improvements in topic focus (STAS) while maintaining competitive ROUGE scores, indicating practical impact for generating topic-focused summaries in real-world applications. The paper also points toward future work on broader controllable attributes, arbitrary-topic tagging, and richer contextual embeddings to further enhance topic-aligned summarization. STAS offers a scalable, interpretable automatic evaluation aligned with user-topic requirements, which is valuable for developers, search engines, and AI chat systems seeking contextually focused summaries.

Abstract

Topic-controllable summarization is an emerging research area with a wide range of potential applications. However, existing approaches suffer from significant limitations. For example, the majority of existing methods built upon recurrent architectures, which can significantly limit their performance compared to more recent Transformer-based architectures, while they also require modifications to the model's architecture for controlling the topic. At the same time, there is currently no established evaluation metric designed specifically for topic-controllable summarization. This work proposes a new topic-oriented evaluation measure to automatically evaluate the generated summaries based on the topic affinity between the generated summary and the desired topic. The reliability of the proposed measure is demonstrated through appropriately designed human evaluation. In addition, we adapt topic embeddings to work with powerful Transformer architectures and propose a novel and efficient approach for guiding the summary generation through control tokens. Experimental results reveal that control tokens can achieve better performance compared to more complicated embedding-based approaches while also being significantly faster.

Topic-Controllable Summarization: Topic-Aware Evaluation and Transformer Methods

TL;DR

This work tackles topic-controllable summarization by (1) introducing STAS, a topic-aware evaluation metric based on cosine similarity between topic and summary representations, normalised by the dominant topic, and (2) adapting topic control to Transformer architectures via two strategies: topic embeddings and control tokens. It demonstrates that control tokens, especially when combined with prepending and tagging, yield higher topic alignment and faster inference than embedding-based methods, including in zero-shot settings. A synthetic topic-oriented CNN/DailyMail dataset is released to train and evaluate models, and STAS is validated against human judgments with high correlation. The results show substantial improvements in topic focus (STAS) while maintaining competitive ROUGE scores, indicating practical impact for generating topic-focused summaries in real-world applications. The paper also points toward future work on broader controllable attributes, arbitrary-topic tagging, and richer contextual embeddings to further enhance topic-aligned summarization. STAS offers a scalable, interpretable automatic evaluation aligned with user-topic requirements, which is valuable for developers, search engines, and AI chat systems seeking contextually focused summaries.

Abstract

Topic-controllable summarization is an emerging research area with a wide range of potential applications. However, existing approaches suffer from significant limitations. For example, the majority of existing methods built upon recurrent architectures, which can significantly limit their performance compared to more recent Transformer-based architectures, while they also require modifications to the model's architecture for controlling the topic. At the same time, there is currently no established evaluation metric designed specifically for topic-controllable summarization. This work proposes a new topic-oriented evaluation measure to automatically evaluate the generated summaries based on the topic affinity between the generated summary and the desired topic. The reliability of the proposed measure is demonstrated through appropriately designed human evaluation. In addition, we adapt topic embeddings to work with powerful Transformer architectures and propose a novel and efficient approach for guiding the summary generation through control tokens. Experimental results reveal that control tokens can achieve better performance compared to more complicated embedding-based approaches while also being significantly faster.
Paper Structure (17 sections, 4 equations, 1 figure, 6 tables)

This paper contains 17 sections, 4 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Obtaining representative words, given a topic-assigned document collection. First, we calculate vector representations for each document. Then, documents of the same topic are grouped and their vector representations is averaged. Finally, we obtain the words with top $N$ scores.