A Survey of Pun Generation: Datasets, Evaluations and Methodologies
Yuchen Su, Yonghua Zhu, Ruofan Wang, Zijian Huang, Diana Benavides-Prado, Michael Witbrock
TL;DR
This survey comprehensively maps pun generation research across three decades, detailing datasets, methodologies, and evaluation practices. It categorizes approaches into conventional templates, classic DNNs, fine-tuned PLMs, prompting strategies, and visual-language methods, summarizing how each stage leverages linguistic ambiguity and contextual cues to produce humorous outputs. The review highlights prevalent evaluation frameworks—automatic measures of funniness, diversity, and fluency, alongside human judgments—while identifying gaps in multilingual and multimodal pun generation. It also discusses current limitations of datasets and evaluation, and outlines promising directions such as cross-lingual datasets, multimodal puns, and advanced prompting designs to push the field forward. Overall, the work provides a foundational resource for researchers aiming to advance creative NLG through principled, data-driven punctuations of humor.
Abstract
Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey that systematically reviews this specific area. To bridge this gap, this paper provides a comprehensive review of pun generation datasets and methods across different stages, including conventional approaches, deep learning techniques, and pre-trained language models. Additionally, we summarise both automated and human evaluation metrics used to assess the quality of pun generation. Finally, we discuss the research challenges and propose promising directions for future work.
