Table of Contents
Fetching ...

The Language of Interoception: Examining Embodiment and Emotion Through a Corpus of Body Part Mentions

Sophie Wu, Jan Philip Wahle, Saif M. Mohammad

TL;DR

This work investigates whether everyday language encodes bodily experience by analyzing body part mentions (BPMs) in large online corpora. It introduces two BPM-focused corpora and an emotion-annotated subset, combining lexicon-based affect analysis with human annotations to map BPM usage to affect. The results show BPMs are pervasive, more emotionally charged than non-BPM text, and their prevalence correlates with regional health indicators, suggesting BPMs as scalable signals of wellbeing. By releasing data and outlining clear research questions, the study provides a foundation for future NLP research at the intersection of embodiment, emotion, and health.

Abstract

This paper is the first investigation of the connection between emotion, embodiment, and everyday language in a large sample of natural language data. We created corpora of body part mentions (BPMs) in online English text (blog posts and tweets). This includes a subset featuring human annotations for the emotions of the person whose body part is mentioned in the text. We show that BPMs are common in personal narratives and tweets (~5% to 10% of posts include BPMs) and that their usage patterns vary markedly by time and %geographic location. Using word-emotion association lexicons and our annotated data, we show that text containing BPMs tends to be more emotionally charged, even when the BPM is not explicitly used to describe a physical reaction to the emotion in the text. Finally, we discover a strong and statistically significant correlation between body-related language and a variety of poorer health outcomes. In sum, we argue that investigating the role of body-part related words in language can open up valuable avenues of future research at the intersection of NLP, the affective sciences, and the study of human wellbeing.

The Language of Interoception: Examining Embodiment and Emotion Through a Corpus of Body Part Mentions

TL;DR

This work investigates whether everyday language encodes bodily experience by analyzing body part mentions (BPMs) in large online corpora. It introduces two BPM-focused corpora and an emotion-annotated subset, combining lexicon-based affect analysis with human annotations to map BPM usage to affect. The results show BPMs are pervasive, more emotionally charged than non-BPM text, and their prevalence correlates with regional health indicators, suggesting BPMs as scalable signals of wellbeing. By releasing data and outlining clear research questions, the study provides a foundation for future NLP research at the intersection of embodiment, emotion, and health.

Abstract

This paper is the first investigation of the connection between emotion, embodiment, and everyday language in a large sample of natural language data. We created corpora of body part mentions (BPMs) in online English text (blog posts and tweets). This includes a subset featuring human annotations for the emotions of the person whose body part is mentioned in the text. We show that BPMs are common in personal narratives and tweets (~5% to 10% of posts include BPMs) and that their usage patterns vary markedly by time and %geographic location. Using word-emotion association lexicons and our annotated data, we show that text containing BPMs tends to be more emotionally charged, even when the BPM is not explicitly used to describe a physical reaction to the emotion in the text. Finally, we discover a strong and statistically significant correlation between body-related language and a variety of poorer health outcomes. In sum, we argue that investigating the role of body-part related words in language can open up valuable avenues of future research at the intersection of NLP, the affective sciences, and the study of human wellbeing.

Paper Structure

This paper contains 25 sections, 17 figures, 9 tables.

Figures (17)

  • Figure 1: B4 - TUSC$_{\rm \it ctry}$ - % of tweets with at least one "my <BPM>" by month. Colored by season in USA.
  • Figure 2: B4 - TUSC$_{\rm \it ctry}$ - % of tweets with at least one "my <BPM>" for different weekdays.
  • Figure 3: B5 - TUSC$_{\rm \it city}$ - % of tweets with at least one "my <BPM>" for different cities.
  • Figure 4: B5 - TUSC$_{\rm \it ctry}$ - % of tweets with at least one "my <BPM>" for Canada and USA from 2015 to 2021.
  • Figure 5: BA1 - Percentage of sentences with at least one high or low valence, arousal, or dominance word (according to the NRC VAD lexicon) in each corpus in myBPM, yourBPM, 3pBPM, and noBPM categories.
  • ...and 12 more figures