MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

Zheng Lian; Licai Sun; Yong Ren; Hao Gu; Haiyang Sun; Lan Chen; Bin Liu; Jianhua Tao

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, Jianhua Tao

TL;DR

MERBench tackles the lack of fair comparison in multimodal emotion recognition by introducing a unified benchmark and the Chinese MER2023 dataset. It systematically evaluates unimodal and multimodal features, fusion strategies, cross-corpus transfer, and robustness to punctuation and noise, under a standardized pipeline. Key contributions include a rigorous baseline suite, extensive cross-dataset analyses, and recommendations favoring pre-training and language-aware encoders, plus a public codebase. The work provides a practical framework for reproducible research and highlights directions for robust, scalable emotion recognition in real-world, multilingual settings.

Abstract

Multimodal emotion recognition plays a crucial role in enhancing user experience in human-computer interaction. Over the past few decades, researchers have proposed a series of algorithms and achieved impressive progress. Although each method shows its superior performance, different methods lack a fair comparison due to inconsistencies in feature extractors, evaluation manners, and experimental settings. These inconsistencies severely hinder the development of this field. Therefore, we build MERBench, a unified evaluation benchmark for multimodal emotion recognition. We aim to reveal the contribution of some important techniques employed in previous works, such as feature selection, multimodal fusion, robustness analysis, fine-tuning, pre-training, etc. We hope this benchmark can provide clear and comprehensive guidance for follow-up researchers. Based on the evaluation results of MERBench, we further point out some promising research directions. Additionally, we introduce a new emotion dataset MER2023, focusing on the Chinese language environment. This dataset can serve as a benchmark dataset for research on multi-label learning, noise robustness, and semi-supervised learning. We encourage the follow-up researchers to evaluate their algorithms under the same experimental setup as MERBench for fair comparisons. Our code is available at: https://github.com/zeroQiaoba/MERTools.

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

TL;DR

Abstract

Paper Structure (24 sections, 8 equations, 9 figures, 20 tables, 1 algorithm)

This paper contains 24 sections, 8 equations, 9 figures, 20 tables, 1 algorithm.

Introduction
Related Works
Emotional Corpus
Unimodal Features
Multimodal Fusion
MER2023 Dataset
Data Collection
Data Annotation
Data Splitting
MER2023 Baselines
Problem Definition and Notations
Data Preprocessing
Model Structure
Implementation Details
Results and Discussion
...and 9 more sections

Figures (9)

Figure 1: Pipeline of data annotation.
Figure 2: Empirical PDFs and estimated Gaussian models on sample lengths for different subsets.
Figure 3: Distribution of discrete emotions for different subsets (neutral, anger, happiness, sadness, worry, surprise).
Figure 4: Empirical PDF on the valence for different discrete emotions. We calculate statistics using all valence-labeled samples.
Figure 5: Impact of language matching for acoustic encoders. In this table, we reveal the relationship between the primary training language of the acoustic encoder and the input language.
...and 4 more figures

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

TL;DR

Abstract

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (9)