Table of Contents
Fetching ...

Not All Subjectivity Is the Same! Defining Desiderata for the Evaluation of Subjectivity in NLP

Urja Khurana, Michiel van der Meer, Enrico Liscio, Antske Fokkens, Pradeep K. Murukannaiah

Abstract

Subjective judgments are part of several NLP datasets and recent work is increasingly prioritizing models whose outputs reflect this diversity of perspectives. Such responses allow us to shed light on minority voices, which are frequently marginalized or obscured by dominant perspectives. It remains a question whether our evaluation practices align with these models' objectives. This position paper proposes seven evaluation desiderata for subjectivity-sensitive models, rooted in how subjectivity is represented in NLP data and models. The desiderata are constructed in a top-down approach, keeping in mind the user-centric impact of such models. We scan the experimental setup of 60 papers and show that various aspects of subjectivity are still understudied: the distinction between ambiguous and polyphonic input, whether subjectivity is effectively expressed to the user, and a lack of interplay between different desiderata, amongst other gaps.

Not All Subjectivity Is the Same! Defining Desiderata for the Evaluation of Subjectivity in NLP

Abstract

Subjective judgments are part of several NLP datasets and recent work is increasingly prioritizing models whose outputs reflect this diversity of perspectives. Such responses allow us to shed light on minority voices, which are frequently marginalized or obscured by dominant perspectives. It remains a question whether our evaluation practices align with these models' objectives. This position paper proposes seven evaluation desiderata for subjectivity-sensitive models, rooted in how subjectivity is represented in NLP data and models. The desiderata are constructed in a top-down approach, keeping in mind the user-centric impact of such models. We scan the experimental setup of 60 papers and show that various aspects of subjectivity are still understudied: the distinction between ambiguous and polyphonic input, whether subjectivity is effectively expressed to the user, and a lack of interplay between different desiderata, amongst other gaps.

Paper Structure

This paper contains 52 sections, 3 figures.

Figures (3)

  • Figure 1: The sequential nature of our proposed desiderata for evaluating a subjectivity-sensitive model.
  • Figure 2: Dimensions of sources of subjectivity at a sample-level: ambiguity on the x-axis and polyphony on the y-axis. For each quadrant, we show the desired model output (as icons) with an example situated in hate speech detection.
  • Figure 3: Statistics from mapping subjective evaluation practices to our desiderata.