Can Third-parties Read Our Emotions?
Jiayi Li, Yingfan Zhou, Pranav Narayanan Venkit, Halima Binte Islam, Sneha Arya, Shomir Wilson, Sarah Rajtmajer
TL;DR
The paper investigates whether third-party annotations (human or LLM-based) faithfully capture authors' private states in emotion recognition. Through a two-stage study collecting first-party self-reports and third-party labels (humans split into in-group/out-group and multiple LLMs), it shows substantial misalignment between third-party labels and first-party emotions, with LLMs generally outperforming human annotators. It further demonstrates that demographic similarity between authors and in-group annotators improves alignment, and that including first-party demographic information in LLM prompts yields modest gains. The work proposes an evaluative framework for assessing third-party annotation limitations and advocates refined annotation practices and ethical considerations for modeling private states in NLP applications.
Abstract
Natural Language Processing tasks that aim to infer an author's private states, e.g., emotions and opinions, from their written text, typically rely on datasets annotated by third-party annotators. However, the assumption that third-party annotators can accurately capture authors' private states remains largely unexamined. In this study, we present human subjects experiments on emotion recognition tasks that directly compare third-party annotations with first-party (author-provided) emotion labels. Our findings reveal significant limitations in third-party annotations-whether provided by human annotators or large language models (LLMs)-in faithfully representing authors' private states. However, LLMs outperform human annotators nearly across the board. We further explore methods to improve third-party annotation quality. We find that demographic similarity between first-party authors and third-party human annotators enhances annotation performance. While incorporating first-party demographic information into prompts leads to a marginal but statistically significant improvement in LLMs' performance. We introduce a framework for evaluating the limitations of third-party annotations and call for refined annotation practices to accurately represent and model authors' private states.
