The Language of Interoception: Examining Embodiment and Emotion Through a Corpus of Body Part Mentions
Sophie Wu, Jan Philip Wahle, Saif M. Mohammad
TL;DR
This work investigates whether everyday language encodes bodily experience by analyzing body part mentions (BPMs) in large online corpora. It introduces two BPM-focused corpora and an emotion-annotated subset, combining lexicon-based affect analysis with human annotations to map BPM usage to affect. The results show BPMs are pervasive, more emotionally charged than non-BPM text, and their prevalence correlates with regional health indicators, suggesting BPMs as scalable signals of wellbeing. By releasing data and outlining clear research questions, the study provides a foundation for future NLP research at the intersection of embodiment, emotion, and health.
Abstract
This paper is the first investigation of the connection between emotion, embodiment, and everyday language in a large sample of natural language data. We created corpora of body part mentions (BPMs) in online English text (blog posts and tweets). This includes a subset featuring human annotations for the emotions of the person whose body part is mentioned in the text. We show that BPMs are common in personal narratives and tweets (~5% to 10% of posts include BPMs) and that their usage patterns vary markedly by time and %geographic location. Using word-emotion association lexicons and our annotated data, we show that text containing BPMs tends to be more emotionally charged, even when the BPM is not explicitly used to describe a physical reaction to the emotion in the text. Finally, we discover a strong and statistically significant correlation between body-related language and a variety of poorer health outcomes. In sum, we argue that investigating the role of body-part related words in language can open up valuable avenues of future research at the intersection of NLP, the affective sciences, and the study of human wellbeing.
