Table of Contents
Fetching ...

English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts

Patrick Bareiß, Roman Klinger, Jeremy Barnes

TL;DR

In which language should the authors prompt for emotion labels on non-English texts?

Abstract

Emotion classification in text is a challenging task due to the processes involved when interpreting a textual description of a potential emotion stimulus. In addition, the set of emotion categories is highly domain-specific. For instance, literature analysis might require the use of aesthetic emotions (e.g., finding something beautiful), and social media analysis could benefit from fine-grained sets (e.g., separating anger from annoyance) than only those that represent basic categories as they have been proposed by Paul Ekman (anger, disgust, fear, joy, surprise, sadness). This renders the task an interesting field for zero-shot classifications, in which the label set is not known at model development time. Unfortunately, most resources for emotion analysis are English, and therefore, most studies on emotion analysis have been performed in English, including those that involve prompting language models for text labels. This leaves us with a research gap that we address in this paper: In which language should we prompt for emotion labels on non-English texts? This is particularly of interest when we have access to a multilingual large language model, because we could request labels with English prompts even for non-English data. Our experiments with natural language inference-based language models show that it is consistently better to use English prompts even if the data is in a different language.

English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts

TL;DR

In which language should the authors prompt for emotion labels on non-English texts?

Abstract

Emotion classification in text is a challenging task due to the processes involved when interpreting a textual description of a potential emotion stimulus. In addition, the set of emotion categories is highly domain-specific. For instance, literature analysis might require the use of aesthetic emotions (e.g., finding something beautiful), and social media analysis could benefit from fine-grained sets (e.g., separating anger from annoyance) than only those that represent basic categories as they have been proposed by Paul Ekman (anger, disgust, fear, joy, surprise, sadness). This renders the task an interesting field for zero-shot classifications, in which the label set is not known at model development time. Unfortunately, most resources for emotion analysis are English, and therefore, most studies on emotion analysis have been performed in English, including those that involve prompting language models for text labels. This leaves us with a research gap that we address in this paper: In which language should we prompt for emotion labels on non-English texts? This is particularly of interest when we have access to a multilingual large language model, because we could request labels with English prompts even for non-English data. Our experiments with natural language inference-based language models show that it is consistently better to use English prompts even if the data is in a different language.
Paper Structure (25 sections, 6 figures, 5 tables)

This paper contains 25 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: We study the interaction of data and prompt language, while considering the underlying NLI-model and the role of the prompt type.
  • Figure 2: Overview of our experimental setting. We compare models from Huggingface and multiple prompt types for NLI-based emotion classification from plaza-del-arco-etal-2022-naturalplaza-del-arco-etal-2022-natural. Across them, we study the relation between the data language and the prompt language for 18 languages. To obtain the prompt in various languages, we apply Google Translate. An example setup would be the German subset of the Universal Joy corpus with an XLM-RoBERTa NLI model and the prompt as "This person feels X" translated to German (or left in English).
  • Figure 3: Interaction of prompt types and data languages. Each cell contains the average F$_1$ across NLI models. The prompt is always in English. The color corresponds to the rank and therefore indicates consistency of the results.
  • Figure 4: Universal Joy
  • Figure 5: de/enISEAR
  • ...and 1 more figures