Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written
Venkata S Govindarajan, Laura Biester
TL;DR
This work introduces a novel corpus of intentionally bad humor drawn from the Bulwer-Lytton Fiction Contest, coupled with synthetic BL sentences generated by multiple large language models. It shows that BL humor diverges markedly from standard humor datasets, with stronger use of literary devices like Irony, Metafiction, and Simile, and a prevalence of novel adjective-noun expressions. Humor-detection models trained on conventional data underperform on BL, indicating a domain-specific gap; synthetic BL sentences imitate the form but exaggerate stylistic features, offering a lens into how prompts shape generation. The study combines data collection, humor-detection evaluation, literary device analysis via a GPT-based feature framework, and surprisal-based incongruity analysis to map the distinctive landscape of BL humor and its susceptibility to synthetic replication, with public data and code to enable further research.
Abstract
Textual humor is enormously diverse and computational studies need to account for this range, including intentionally bad humor. In this paper, we curate and analyze a novel corpus of sentences from the Bulwer-Lytton Fiction Contest to better understand "bad" humor in English. Standard humor detection models perform poorly on our corpus, and an analysis of literary devices finds that these sentences combine features common in existing humor datasets (e.g., puns, irony) with metaphor, metafiction and simile. LLMs prompted to synthesize contest-style sentences imitate the form but exaggerate the effect by over-using certain literary devices, and including far more novel adjective-noun bigrams than human writers. Data, code and analysis are available at https://github.com/venkatasg/bulwer-lytton
