Table of Contents
Fetching ...

Generics are puzzling. Can language models find the missing piece?

Gustavo Cilleruelo Calderón, Emily Allaway, Barry Haddow, Alexandra Birch

TL;DR

Generics pose a semantic puzzle because they generalise without explicit quantification and tolerate exceptions. The authors introduce ConGen, a dataset of naturally occurring generic and quantified sentences with context, and p-acceptability, a surprisal-based metric to infer implicit quantification. Across multiple Mistral language models, they find substantial context-sensitivity for generics, reveal a notable presence of weak generics (~18–23%), and observe biases related to stereotypes that can be mitigated by instruction tuning. These findings provide a dataset and analytical tool for studying quantification in language, with implications for linguistics, NLP, and bias-aware AI systems.

Abstract

Generic sentences express generalisations about the world without explicit quantification. Although generics are central to everyday communication, building a precise semantic framework has proven difficult, in part because speakers use generics to generalise properties with widely different statistical prevalence. In this work, we study the implicit quantification and context-sensitivity of generics by leveraging language models as models of language. We create ConGen, a dataset of 2873 naturally occurring generic and quantified sentences in context, and define p-acceptability, a metric based on surprisal that is sensitive to quantification. Our experiments show generics are more context-sensitive than determiner quantifiers and about 20% of naturally occurring generics we analyze express weak generalisations. We also explore how human biases in stereotypes can be observed in language models.

Generics are puzzling. Can language models find the missing piece?

TL;DR

Generics pose a semantic puzzle because they generalise without explicit quantification and tolerate exceptions. The authors introduce ConGen, a dataset of naturally occurring generic and quantified sentences with context, and p-acceptability, a surprisal-based metric to infer implicit quantification. Across multiple Mistral language models, they find substantial context-sensitivity for generics, reveal a notable presence of weak generics (~18–23%), and observe biases related to stereotypes that can be mitigated by instruction tuning. These findings provide a dataset and analytical tool for studying quantification in language, with implications for linguistics, NLP, and bias-aware AI systems.

Abstract

Generic sentences express generalisations about the world without explicit quantification. Although generics are central to everyday communication, building a precise semantic framework has proven difficult, in part because speakers use generics to generalise properties with widely different statistical prevalence. In this work, we study the implicit quantification and context-sensitivity of generics by leveraging language models as models of language. We create ConGen, a dataset of 2873 naturally occurring generic and quantified sentences in context, and define p-acceptability, a metric based on surprisal that is sensitive to quantification. Our experiments show generics are more context-sensitive than determiner quantifiers and about 20% of naturally occurring generics we analyze express weak generalisations. We also explore how human biases in stereotypes can be observed in language models.

Paper Structure

This paper contains 53 sections, 3 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: P-acceptable quantifiers on both datasets correspond to semantic intuitions (Mistral-7B)
  • Figure 2: Implicit quantification in ConGen generics across Mistral models.
  • Figure 3: Percentage of correct p-acceptable quantifiers with different contexts. (Mistral-8×22B)
  • Figure 4: Implicit quantification with different left-side contexts on generic sentences from ConGen. (Mistral-8×22B)
  • Figure 5: Different p-acceptability rates for each paraphrase of stereotyping generic sentences for Mistral-7B and Mistral-7B Instruct. Paraphrases are indicated as bp (bare plural), sg ppl (singular $+$ 'people') and ppl who ('People who are' $+$ singular).
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 4.1: p-acceptability