Designing NLP Systems That Adapt to Diverse Worldviews
Claudiu Creanga, Liviu P. Dinu
TL;DR
This paper argues that NLP systems for NLI fail to generalize due to ignoring the subjective nature of meaning tied to individual worldviews (weltanschauung). It proposes perspectivist, worldview-annotated datasets that incorporate annotator demographics, values, and justifications, moving beyond aggregated labels. Early experiments on a SBIC subset show that including annotator metadata can improve model performance (test F1 up to 0.38) compared to aggregated baselines, suggesting better alignment with diverse interpretations. The work highlights the potential for worldview-aware modeling to enhance generalization and reduce reliance on shallow heuristics in language understanding tasks.
Abstract
Natural Language Inference (NLI) is foundational for evaluating language understanding in AI. However, progress has plateaued, with models failing on ambiguous examples and exhibiting poor generalization. We argue that this stems from disregarding the subjective nature of meaning, which is intrinsically tied to an individual's \textit{weltanschauung} (which roughly translates to worldview). Existing NLP datasets often obscure this by aggregating labels or filtering out disagreement. We propose a perspectivist approach: building datasets that capture annotator demographics, values, and justifications for their labels. Such datasets would explicitly model diverse worldviews. Our initial experiments with a subset of the SBIC dataset demonstrate that even limited annotator metadata can improve model performance.
