Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models

Katrin Olsen; Sebastian Padó

Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models

Katrin Olsen, Sebastian Padó

TL;DR

This work interrogates the boundary between semantic anomaly and pure nonsense by assessing sensicality of semantically deviant sentences with and without generated contexts, using human judgments and two large language models. The authors sample 40 items from five datasets ADEPT, BLiMP, PAP, CConS, and Cusp, and collect context-generated readings and sensicality scores from both humans and LLMs. They find that a substantial portion of sentences deemed nonsensical in prior work are interpretable once contextualized, and that LLMs can produce plausible contexts that increase sensicality for many items. The study reveals both alignment and divergence between human judgments and LLM-augmented readings, highlighting the importance of contextual grounding in semantic evaluation and its potential impact on modeling and evaluation practices.

Abstract

Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely anomalous (but can be interpreted given a supporting context) and what is truly nonsensical. However, it is unclear (a) how nonsensical, rather than merely anomalous, existing datasets are; and (b) how well LLMs can make this distinction. In this paper, we answer both questions by collecting sensicality judgments from human raters and LLMs on sentences from five semantically deviant datasets: both context-free and when providing a context. We find that raters consider most sentences at most anomalous, and only a few as properly nonsensical. We also show that LLMs are substantially skilled in generating plausible contexts for anomalous cases.

Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models

TL;DR

Abstract

Paper Structure (26 sections, 8 figures, 5 tables)

This paper contains 26 sections, 8 figures, 5 tables.

Introduction
Related Work
Data
ADEPT
BLiMP
PAP
CConS
Cusp
Models
Methods
LLM Prompting
Human Annotations
Can LLMs Contextualize Semantic Anomalies?
Contextualizations by Dataset
Can LLMs Score Sensicality?
...and 11 more sections

Figures (8)

Figure 1: Visualization of sentence scoring process
Figure 2: Effect of context on human sentence sensicality ratings (left: Phi context, right: Llama context)
Figure 3: Human annotation sensicality scores by dataset (left: without context, right: with LLM-generated context)
Figure 4: LLM scores on sentence sensicality by dataset (left: Phi, right: Llama)
Figure 5: Effect of Context Sensicality Scoring by Humans and LLMs (dashed line: without context; solid lines: regression lines for scorers)
...and 3 more figures

Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models

TL;DR

Abstract

Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)