Generating clickbait spoilers with an ensemble of large language models
Mateusz Woźny, Mateusz Lango
TL;DR
This paper tackles the generation of clickbait spoilers to neutralize misleading headlines. It introduces an ensemble of LoRA-finetuned large language models that first convert clickbaits to questions, then generate diverse spoiler candidates, and finally rank them with learning-to-rank models to produce phrase, passage, or multi-part spoilers. The results show the ensemble outperforms prior QA-based spoiler methods across BLEU, METEOR, and BERTScore, with the largest gains on multi-part spoilers. The work demonstrates a flexible, scalable pipeline that integrates LLM-driven generation with ranking to enhance spoiler quality and coverage.
Abstract
Clickbait posts are a widespread problem in the webspace. The generation of spoilers, i.e. short texts that neutralize clickbait by providing information that satisfies the curiosity induced by it, is one of the proposed solutions to the problem. Current state-of-the-art methods are based on passage retrieval or question answering approaches and are limited to generating spoilers only in the form of a phrase or a passage. In this work, we propose an ensemble of fine-tuned large language models for clickbait spoiler generation. Our approach is not limited to phrase or passage spoilers, but is also able to generate multipart spoilers that refer to several non-consecutive parts of text. Experimental evaluation demonstrates that the proposed ensemble model outperforms the baselines in terms of BLEU, METEOR and BERTScore metrics.
