Table of Contents
Fetching ...

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, Björn W. Schuller

TL;DR

This work introduces the novel Passau-Spontaneous Football Coach Humor dataset, and proposes a novel multimodal architecture that yields the best overall results for automatic analysis of humor and its sentiment.

Abstract

Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

TL;DR

This work introduces the novel Passau-Spontaneous Football Coach Humor dataset, and proposes a novel multimodal architecture that yields the best overall results for automatic analysis of humor and its sentiment.

Abstract

Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.
Paper Structure (31 sections, 12 equations, 5 figures, 7 tables)

This paper contains 31 sections, 12 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Illustration of the humor styles proposed by martin2003individual. The x-axis represents the direction dimension (self vs other-directed), and the y-axis the sentiment dimension (negative vs positive sentiment). The resulting quadrants each correspond to a humor style and are illustrated with items from the HSQ which are associated with the respective humor style (cf. martin2003individual p. 58f.)
  • Figure 2: Mean humor agreements per coach and rater. Each cell represents the mean $\alpha$ value on the binary humor labels between an annotator and all the other annotators for the respective coach.
  • Figure 3: Percentage of humorous segments per coach in the gold standard, partitioned by humor style.
  • Figure 4: Overview of the setup of the unimodal and corresponding late fusion experiments. For all segments, three features for each of the three modalities are extracted and fed into unimodal systems for Sentiment Prediction, Direction Prediction, and Humor Recognition, respectively. In, addition, late fusions (LF) are conducted. *The sentence roughly translates to "After the game, I know, one knows better".
  • Figure 5: The proposed VFMM architecture.