Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
Xinran Li, Xiujuan Xu, Jiaqi Qiao
TL;DR
This work tackles Emotion Recognition in Conversation (ERC) by addressing context redundancy and data imbalance. It introduces LSDGNN, a DAG-based model that fuses long- and short-distance context through a Differential Regularizer and a BiAffine interaction mechanism, coupled with an Improved Curriculum Learning (ICL) strategy that uses Weighted Emotional Shifts to guide training from easy to hard samples. Empirical results on IEMOCAP and MELD show state-of-the-art performance and robust improvements, with ablations confirming the contribution of each component. The approach is modular and transferable, offering a practical framework for advancing multimodal ERC research and applications.
Abstract
Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a "weighted emotional shift" metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.
