Why Do We Laugh? Annotation and Taxonomy Generation for Laughable Contexts in Spontaneous Text Conversation
Koji Inoue, Mikey Elmers, Divesh Lala, Tatsuya Kawahara
TL;DR
The paper addresses the challenge of identifying laughable contexts in spontaneous dialogue by combining human binary annotations with LLM-driven taxonomy generation. Using the RealPersonaChat dataset in Japanese, five annotators labeled utterances as laughable or not, revealing substantial subjectivity but enabling a majority-labeled subset of 3,739 laughable contexts. An iterative GPT-4o-based process produced a ten-category taxonomy of laugh-inducing reasons, which was used to annotate reason samples and analyze label relationships. GPT-4o was then evaluated on its zero-shot laughable-context recognition, achieving an F1 score of 43.14% and revealing varying per-label performance that suggests both the potential and current limits of LLMs in nuanced humor understanding. The work lays foundations for more natural, context-sensitive conversational AI and points to future multilingual, multimodal, and spoken-dialogue extensions to better model human laughter.
Abstract
Laughter serves as a multifaceted communicative signal in human interaction, yet its identification within dialogue presents a significant challenge for conversational AI systems. This study addresses this challenge by annotating laughable contexts in Japanese spontaneous text conversation data and developing a taxonomy to classify the underlying reasons for such contexts. Initially, multiple annotators manually labeled laughable contexts using a binary decision (laughable or non-laughable). Subsequently, an LLM was used to generate explanations for the binary annotations of laughable contexts, which were then categorized into a taxonomy comprising ten categories, including "Empathy and Affinity" and "Humor and Surprise," highlighting the diverse range of laughter-inducing scenarios. The study also evaluated GPT-4o's performance in recognizing the majority labels of laughable contexts, achieving an F1 score of 43.14%. These findings contribute to the advancement of conversational AI by establishing a foundation for more nuanced recognition and generation of laughter, ultimately fostering more natural and engaging human-AI interactions.
