Table of Contents
Fetching ...

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Xinran Li, Xiujuan Xu, Jiaqi Qiao

TL;DR

This work tackles Emotion Recognition in Conversation (ERC) by addressing context redundancy and data imbalance. It introduces LSDGNN, a DAG-based model that fuses long- and short-distance context through a Differential Regularizer and a BiAffine interaction mechanism, coupled with an Improved Curriculum Learning (ICL) strategy that uses Weighted Emotional Shifts to guide training from easy to hard samples. Empirical results on IEMOCAP and MELD show state-of-the-art performance and robust improvements, with ablations confirming the contribution of each component. The approach is modular and transferable, offering a practical framework for advancing multimodal ERC research and applications.

Abstract

Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a "weighted emotional shift" metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

TL;DR

This work tackles Emotion Recognition in Conversation (ERC) by addressing context redundancy and data imbalance. It introduces LSDGNN, a DAG-based model that fuses long- and short-distance context through a Differential Regularizer and a BiAffine interaction mechanism, coupled with an Improved Curriculum Learning (ICL) strategy that uses Weighted Emotional Shifts to guide training from easy to hard samples. Empirical results on IEMOCAP and MELD show state-of-the-art performance and robust improvements, with ablations confirming the contribution of each component. The approach is modular and transferable, offering a practical framework for advancing multimodal ERC research and applications.

Abstract

Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a "weighted emotional shift" metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.

Paper Structure

This paper contains 25 sections, 20 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: A directed acyclic graph (DAG) constructed from a three-party conversation with the hyperparameter $\omega=2$. The utterances are ordered from left to right according to the speaking sequence. Dashed lines represent dependencies between the same speaker, while solid lines represent dependencies between different speakers. The speakers are represented in blue, green, and red.
  • Figure 2: The architecture diagram of LSDGNN. The left channel processes long-distance features, and the right channel processes short-distance features. The original inputs ${H_L}^0$ and ${H_S}^0$ are actually the same. In LSDGNN, at each layer, long-distance and short-distance features are processed using the Differential Regularizer and BiAffine Module. Here, $i$ represents the $i$-th utterance, and $j$ represents the features from the $j$-th layer.
  • Figure 3: Previous psychological research longago suggested that emotions consist of two dimensions: Valence and Arousal, and emotions are described using a two-dimensional coordinate system similar to a wheel. Inspired by Jing et al. circle, we construct this diagram, which includes all the emotions from the standard ERC dataset. Each emotion label can be mapped to a point on the unit circle.