"A good pun is its own reword": Can Large Language Models Understand Puns?
Zhijun Xu, Siyu Yuan, Lingjie Chen, Deqing Yang
TL;DR
This work systematically probes large language models for pun understanding across three tasks: recognition, explanation, and generation. It introduces novel evaluation methods tailored to in-context learning, including dual-biased prompts, punchline checks, CoT prompts, and an Overlap metric to assess originality. Across eight LLMs and two pun types, the study finds prompt bias significantly shapes recognition, that explanations struggle with het-puns yet can reach human-level quality in some models, and that generation shows a prevalent lazy-pun pattern but can achieve strong results in constrained setups, especially with larger models. The results advance our understanding of pun processing in LLMs and provide robust evaluation frameworks and datasets to guide future research in linguistic humor and creative text generation.
Abstract
Puns play a vital role in academic research due to their distinct structure and clear definition, which aid in the comprehensive analysis of linguistic humor. However, the understanding of puns in large language models (LLMs) has not been thoroughly examined, limiting their use in creative writing and humor creation. In this paper, we leverage three popular tasks, i.e., pun recognition, explanation and generation to systematically evaluate the capabilities of LLMs in pun understanding. In addition to adopting the automated evaluation metrics from prior research, we introduce new evaluation methods and metrics that are better suited to the in-context learning paradigm of LLMs. These new metrics offer a more rigorous assessment of an LLM's ability to understand puns and align more closely with human cognition than previous metrics. Our findings reveal the "lazy pun generation" pattern and identify the primary challenges LLMs encounter in understanding puns.
