Generics are puzzling. Can language models find the missing piece?
Gustavo Cilleruelo Calderón, Emily Allaway, Barry Haddow, Alexandra Birch
TL;DR
Generics pose a semantic puzzle because they generalise without explicit quantification and tolerate exceptions. The authors introduce ConGen, a dataset of naturally occurring generic and quantified sentences with context, and p-acceptability, a surprisal-based metric to infer implicit quantification. Across multiple Mistral language models, they find substantial context-sensitivity for generics, reveal a notable presence of weak generics (~18–23%), and observe biases related to stereotypes that can be mitigated by instruction tuning. These findings provide a dataset and analytical tool for studying quantification in language, with implications for linguistics, NLP, and bias-aware AI systems.
Abstract
Generic sentences express generalisations about the world without explicit quantification. Although generics are central to everyday communication, building a precise semantic framework has proven difficult, in part because speakers use generics to generalise properties with widely different statistical prevalence. In this work, we study the implicit quantification and context-sensitivity of generics by leveraging language models as models of language. We create ConGen, a dataset of 2873 naturally occurring generic and quantified sentences in context, and define p-acceptability, a metric based on surprisal that is sensitive to quantification. Our experiments show generics are more context-sensitive than determiner quantifiers and about 20% of naturally occurring generics we analyze express weak generalisations. We also explore how human biases in stereotypes can be observed in language models.
