Table of Contents
Fetching ...

Designing NLP Systems That Adapt to Diverse Worldviews

Claudiu Creanga, Liviu P. Dinu

TL;DR

This paper argues that NLP systems for NLI fail to generalize due to ignoring the subjective nature of meaning tied to individual worldviews (weltanschauung). It proposes perspectivist, worldview-annotated datasets that incorporate annotator demographics, values, and justifications, moving beyond aggregated labels. Early experiments on a SBIC subset show that including annotator metadata can improve model performance (test F1 up to 0.38) compared to aggregated baselines, suggesting better alignment with diverse interpretations. The work highlights the potential for worldview-aware modeling to enhance generalization and reduce reliance on shallow heuristics in language understanding tasks.

Abstract

Natural Language Inference (NLI) is foundational for evaluating language understanding in AI. However, progress has plateaued, with models failing on ambiguous examples and exhibiting poor generalization. We argue that this stems from disregarding the subjective nature of meaning, which is intrinsically tied to an individual's \textit{weltanschauung} (which roughly translates to worldview). Existing NLP datasets often obscure this by aggregating labels or filtering out disagreement. We propose a perspectivist approach: building datasets that capture annotator demographics, values, and justifications for their labels. Such datasets would explicitly model diverse worldviews. Our initial experiments with a subset of the SBIC dataset demonstrate that even limited annotator metadata can improve model performance.

Designing NLP Systems That Adapt to Diverse Worldviews

TL;DR

This paper argues that NLP systems for NLI fail to generalize due to ignoring the subjective nature of meaning tied to individual worldviews (weltanschauung). It proposes perspectivist, worldview-annotated datasets that incorporate annotator demographics, values, and justifications, moving beyond aggregated labels. Early experiments on a SBIC subset show that including annotator metadata can improve model performance (test F1 up to 0.38) compared to aggregated baselines, suggesting better alignment with diverse interpretations. The work highlights the potential for worldview-aware modeling to enhance generalization and reduce reliance on shallow heuristics in language understanding tasks.

Abstract

Natural Language Inference (NLI) is foundational for evaluating language understanding in AI. However, progress has plateaued, with models failing on ambiguous examples and exhibiting poor generalization. We argue that this stems from disregarding the subjective nature of meaning, which is intrinsically tied to an individual's \textit{weltanschauung} (which roughly translates to worldview). Existing NLP datasets often obscure this by aggregating labels or filtering out disagreement. We propose a perspectivist approach: building datasets that capture annotator demographics, values, and justifications for their labels. Such datasets would explicitly model diverse worldviews. Our initial experiments with a subset of the SBIC dataset demonstrate that even limited annotator metadata can improve model performance.
Paper Structure (7 sections, 1 figure, 1 table)

This paper contains 7 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Building a worldview-annotated dataset. It is necessary to have a diverse pool of annotators aligned to the task. Metadata should be collected about each of these annotators: demographic and values. Each annotator should label items according to their worldview, while being mindful of potential "noise" – errors caused by factors like inattention, which are unrelated to their perspective. To mitigate noise and preserve valid interpretations, annotators should provide justifications for their labels. They will then self-review their labels based on their own explanation.