SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun, Kim Sung-Bin, Seungju Han, Youngjae Yu, Tae-Hyun Oh
TL;DR
This work introduces Video Laugh Reasoning, a task aimed at explaining why laughter occurs in video. It presents SMILE, a new multimodal dataset of 887 clips (TED and sitcoms) paired with human-generated explanations for laughter, focusing on audience laughter to reduce subjectivity. A baseline using large language models with a multimodal textual representation (visual, acoustic, and semantic cues) demonstrates that LLMs can generate plausible, though not yet human-level, reasons for laughter and can scale to other video understanding tasks and in-the-wild content. The study shows the importance of multimodal information and model scale, provides comprehensive evaluation with standard text-generation metrics and human judgments, and offers insights into the modality contributions across video types and tasks. Overall, SMILE and the proposed approach advance socially intelligent AI capable of interpreting nonverbal signals, with implications for dialogue systems, affective computing, and human-robot interaction.
Abstract
Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/postech-ami/SMILE-Dataset.
