Table of Contents
Fetching ...

The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

Shahin Amiriparian, Lukas Christ, Alexander Kathan, Maurice Gerczuk, Niklas Müller, Steffen Klug, Lukas Stappen, Andreas König, Erik Cambria, Björn Schuller, Simone Eulitz

TL;DR

MuSe 2024 introduces two multimodal affective computing sub-challenges: MuSe-Perception for predicting 16 social attributes from CEO interview videos and MuSe-Humor for cross-cultural humor detection using a German training set and English test data. The paper details datasets, feature extraction across audio, video, and text modalities, and a GRU-based baseline with simple late fusion, reporting unseen-test performance of $\rho$=$0.3573$ for MuSe-Perception and $\mathrm{AUC}$=$0.8682$ for MuSe-Humor. It emphasizes public availability of code and data to promote reproducibility and broad participation, and analyzes unimodal versus multimodal configurations, highlighting the effectiveness of transformer-based and self-supervised representations. Overall, the work demonstrates the promise of multimodal and cross-lingual cues for social perception and humor, providing a solid baseline for future improvements and cross-domain research.

Abstract

The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient ($ρ$) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.

The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

TL;DR

MuSe 2024 introduces two multimodal affective computing sub-challenges: MuSe-Perception for predicting 16 social attributes from CEO interview videos and MuSe-Humor for cross-cultural humor detection using a German training set and English test data. The paper details datasets, feature extraction across audio, video, and text modalities, and a GRU-based baseline with simple late fusion, reporting unseen-test performance of = for MuSe-Perception and = for MuSe-Humor. It emphasizes public availability of code and data to promote reproducibility and broad participation, and analyzes unimodal versus multimodal configurations, highlighting the effectiveness of transformer-based and self-supervised representations. Overall, the work demonstrates the promise of multimodal and cross-lingual cues for social perception and humor, providing a solid baseline for future improvements and cross-domain research.

Abstract

The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient () of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.
Paper Structure (26 sections, 4 tables)