AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition
Zheng Lian, Haiyang Sun, Licai Sun, Jiangyan Yi, Bin Liu, Jianhua Tao
TL;DR
This work tackles the data scarcity in Explainable Multimodal Emotion Recognition (EMER) by constructing EMER-Coarse, a large-scale coarsely labeled dataset derived from MER2024-SEMI, and introducing AffectGPT, a two-stage training framework. Stage1 trains on EMER-Coarse to learn coarse mappings from audio–video–text inputs to emotion-related descriptions, while Stage2 fine-tunes on the smaller, manually-checked EMER-Fine to align outputs with high-quality labels. Across ablations, Stage1–Stage2 consistently outperforms baselines and showcases the value of large-scale coarse supervision for multimodal emotion understanding, with careful analyses of LLM choices and initialization. The approach enables scalable, explainable EMER research and provides code and data to facilitate future development in open-vocabulary emotion understanding and multimodal reasoning.
Abstract
Explainable Multimodal Emotion Recognition (EMER) is an emerging task that aims to achieve reliable and accurate emotion recognition. However, due to the high annotation cost, the existing dataset (denoted as EMER-Fine) is small, making it difficult to perform supervised training. To reduce the annotation cost and expand the dataset size, this paper reviews the previous dataset construction process. Then, we simplify the annotation pipeline, avoid manual checks, and replace the closed-source models with open-source models. Finally, we build \textbf{EMER-Coarse}, a coarsely-labeled dataset containing large-scale samples. Besides the dataset, we propose a two-stage training framework \textbf{AffectGPT}. The first stage exploits EMER-Coarse to learn a coarse mapping between multimodal inputs and emotion-related descriptions; the second stage uses EMER-Fine to better align with manually-checked results. Experimental results demonstrate the effectiveness of our proposed method on the challenging EMER task. To facilitate further research, we will make the code and dataset available at: https://github.com/zeroQiaoba/AffectGPT.
