Table of Contents
Fetching ...

Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions

Saif M. Mohammad

TL;DR

This work expands affective lexicons to multiword expressions by collecting crowd-sourced valence, arousal, and dominance ratings for about 10k MWEs and 25k unigrams, producing the MWE-VAD Lexicon that complements the NRC VAD Lexicon v1 and yields NRC VAD Lexicon v2. The authors demonstrate very high reliability of the annotations and analyze how different MWE types convey emotion and how compositionality affects VAD in MWEs. They show that MWEs are a substantial source of emotional meaning and provide insights into when an MWE’s affective meaning is predictable from its components. The resource supports research across NLP, psychology, public health, and digital humanities and is released for research use with caveats about biases and ethical considerations.

Abstract

Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D). Existing lexicons such as the NRC VAD Lexicon, published in 2018, include VAD association ratings for words. Here, we present a complement to it, which has human ratings of valence, arousal, and dominance for 10k English Multiword Expressions (MWEs) and their constituent words. We also increase the coverage of unigrams, especially words that have become more common since 2018. In all, the new NRC VAD Lexicon v2 now has entries for 10k MWEs and 25k words, in addition to the entries in v1. We show that the associations are highly reliable. We use the lexicon to examine emotional characteristics of MWEs, including: 1. The degree to which MWEs (idioms, noun compounds, and verb particle constructions) exhibit strong emotionality; 2. The degree of emotional compositionality in MWEs. The lexicon enables a wide variety of research in NLP, Psychology, Public Health, Digital Humanities, and Social Sciences. The NRC VAD Lexicon v2 is freely available through the project webpage: http://saifmohammad.com/WebPages/nrc-vad.html

Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions

TL;DR

This work expands affective lexicons to multiword expressions by collecting crowd-sourced valence, arousal, and dominance ratings for about 10k MWEs and 25k unigrams, producing the MWE-VAD Lexicon that complements the NRC VAD Lexicon v1 and yields NRC VAD Lexicon v2. The authors demonstrate very high reliability of the annotations and analyze how different MWE types convey emotion and how compositionality affects VAD in MWEs. They show that MWEs are a substantial source of emotional meaning and provide insights into when an MWE’s affective meaning is predictable from its components. The resource supports research across NLP, psychology, public health, and digital humanities and is released for research use with caveats about biases and ethical considerations.

Abstract

Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D). Existing lexicons such as the NRC VAD Lexicon, published in 2018, include VAD association ratings for words. Here, we present a complement to it, which has human ratings of valence, arousal, and dominance for 10k English Multiword Expressions (MWEs) and their constituent words. We also increase the coverage of unigrams, especially words that have become more common since 2018. In all, the new NRC VAD Lexicon v2 now has entries for 10k MWEs and 25k words, in addition to the entries in v1. We show that the associations are highly reliable. We use the lexicon to examine emotional characteristics of MWEs, including: 1. The degree to which MWEs (idioms, noun compounds, and verb particle constructions) exhibit strong emotionality; 2. The degree of emotional compositionality in MWEs. The lexicon enables a wide variety of research in NLP, Psychology, Public Health, Digital Humanities, and Social Sciences. The NRC VAD Lexicon v2 is freely available through the project webpage: http://saifmohammad.com/WebPages/nrc-vad.html

Paper Structure

This paper contains 15 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Distributions of MWE types.
  • Figure 2: Measures of Valence Compositionality.
  • Figure 3: Valence Questionnaire: Detailed instructions.
  • Figure 4: Valence Questionnaire: Sample question.
  • Figure 5: Valence Questionnaire: Examples.
  • ...and 10 more figures