NRC VAD Lexicon v2: Norms for Valence, Arousal, and Dominance for over 55k English Terms
Saif M. Mohammad
TL;DR
The paper presents the NRC VAD Lexicon v2, a large, freely available resource providing Valence, Arousal, and Dominance ratings for over 55,000 English terms and phrases, including about 25,000 new words and 10,000 multi-word expressions. It describes a crowdsourced annotation pipeline with rigorous quality control, mapping responses from a $-3$ to $3$ scale to the final $[-1,1]$ VAD scores, and reports high reliability with $\rho$ and $r$ exceeding $0.95$ across dimensions. The methodology combines diverse term sources (including prevalence-based unigrams and MWEs) with IRB-approved data collection and extensive QC, resulting in a robust resource for NLP, psychology, and digital humanities. The lexicon's broad coverage and reliability support a wide range of research and applications, while acknowledging limitations related to language variety, domain sense, and socio-cultural biases; it is released to the research community to facilitate further work on affective word representations.
Abstract
Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D) (also referred to in social cognition research as Competence (C)). These dimensions impact various aspects of our lives from social competence and emotion regulation to success in the work place and how we view the world. We present here the NRC VAD Lexicon v2, which has human ratings of valence, arousal, and dominance for more than 55,000 English words and phrases. Notably, it adds entries for $\sim$25k additional words to v1.0. It also now includes for the first time entries for common multi-word phrases (~10k). We show that the associations are highly reliable. The lexicon enables a wide variety of research in psychology, NLP, public health, digital humanities, and social sciences. The NRC VAD Lexicon v2 is made freely available for research through our project webpage.
