Table of Contents
Fetching ...

Affect, Body, Cognition, Demographics, and Emotion: The ABCDE of Text Features for Computational Affective Science

Jan Philip Wahle, Krishnapriya Vishnubhotla, Bela Gipp, Saif M. Mohammad

TL;DR

This paper introduces ABCDE, a large-scale, open-access dataset of over 400 million text utterances annotated with 136 lexical features spanning Affect, Body, Cognition, Demographics, and Emotion. It aggregates data from social media, blogs, books, and AI-generated sources, and provides lexicon- and regex-derived features to enable cross-domain affective and cognitive research while facilitating reproducibility. The authors discuss the trade-offs between lexical and ML features, propose nine research questions to demonstrate interdisciplinary utility, and illustrate longitudinal and cross-domain analyses, including body–cognition dynamics and demographic patterns. By releasing both data and code, ABCDE aims to lower technical barriers for researchers outside computer science and to support scalable, interpretable analyses across the humanities, social sciences, and NLP.

Abstract

Work in Computational Affective Science and Computational Social Science explores a wide variety of research questions about people, emotions, behavior, and health. Such work often relies on language data that is first labeled with relevant information, such as the use of emotion words or the age of the speaker. Although many resources and algorithms exist to enable this type of labeling, discovering, accessing, and using them remains a substantial impediment, particularly for practitioners outside of computer science. Here, we present the ABCDE dataset (Affect, Body, Cognition, Demographics, and Emotion), a large-scale collection of over 400 million text utterances drawn from social media, blogs, books, and AI-generated sources. The dataset is annotated with a wide range of features relevant to computational affective and social science. ABCDE facilitates interdisciplinary research across numerous fields, including affective science, cognitive science, the digital humanities, sociology, political science, and computational linguistics.

Affect, Body, Cognition, Demographics, and Emotion: The ABCDE of Text Features for Computational Affective Science

TL;DR

This paper introduces ABCDE, a large-scale, open-access dataset of over 400 million text utterances annotated with 136 lexical features spanning Affect, Body, Cognition, Demographics, and Emotion. It aggregates data from social media, blogs, books, and AI-generated sources, and provides lexicon- and regex-derived features to enable cross-domain affective and cognitive research while facilitating reproducibility. The authors discuss the trade-offs between lexical and ML features, propose nine research questions to demonstrate interdisciplinary utility, and illustrate longitudinal and cross-domain analyses, including body–cognition dynamics and demographic patterns. By releasing both data and code, ABCDE aims to lower technical barriers for researchers outside computer science and to support scalable, interpretable analyses across the humanities, social sciences, and NLP.

Abstract

Work in Computational Affective Science and Computational Social Science explores a wide variety of research questions about people, emotions, behavior, and health. Such work often relies on language data that is first labeled with relevant information, such as the use of emotion words or the age of the speaker. Although many resources and algorithms exist to enable this type of labeling, discovering, accessing, and using them remains a substantial impediment, particularly for practitioners outside of computer science. Here, we present the ABCDE dataset (Affect, Body, Cognition, Demographics, and Emotion), a large-scale collection of over 400 million text utterances drawn from social media, blogs, books, and AI-generated sources. The dataset is annotated with a wide range of features relevant to computational affective and social science. ABCDE facilitates interdisciplinary research across numerous fields, including affective science, cognitive science, the digital humanities, sociology, political science, and computational linguistics.

Paper Structure

This paper contains 9 sections, 13 figures.

Figures (13)

  • Figure 1: Overview of the ABCDE dataset.
  • Figure 2: Q1. Emotion: Percent of words per instance from the six Ekman emotion categories.
  • Figure 3: Q1. Affect: Distributions of average Valence, Arousal, and Dominance (VAD) scores of all words per instance, by source.
  • Figure 4: Q2. Body: Percent of instances with a positive flag for possessive body part mentions (e.g., my head/heart/etc.) by source.
  • Figure 5: Q2. Body: % of possessive BPMs for the top ten most common BPMs, across sources.
  • ...and 8 more figures