Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models
Katrin Olsen, Sebastian Padó
TL;DR
This work interrogates the boundary between semantic anomaly and pure nonsense by assessing sensicality of semantically deviant sentences with and without generated contexts, using human judgments and two large language models. The authors sample 40 items from five datasets ADEPT, BLiMP, PAP, CConS, and Cusp, and collect context-generated readings and sensicality scores from both humans and LLMs. They find that a substantial portion of sentences deemed nonsensical in prior work are interpretable once contextualized, and that LLMs can produce plausible contexts that increase sensicality for many items. The study reveals both alignment and divergence between human judgments and LLM-augmented readings, highlighting the importance of contextual grounding in semantic evaluation and its potential impact on modeling and evaluation practices.
Abstract
Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely anomalous (but can be interpreted given a supporting context) and what is truly nonsensical. However, it is unclear (a) how nonsensical, rather than merely anomalous, existing datasets are; and (b) how well LLMs can make this distinction. In this paper, we answer both questions by collecting sensicality judgments from human raters and LLMs on sentences from five semantically deviant datasets: both context-free and when providing a context. We find that raters consider most sentences at most anomalous, and only a few as properly nonsensical. We also show that LLMs are substantially skilled in generating plausible contexts for anomalous cases.
