ERNetCL: A novel emotion recognition network in textual conversation based on curriculum learning strategy
Jiang Li, Xiaoping Wang, Yingjian Liu, Zhigang Zeng
TL;DR
ERNetCL addresses ERC by jointly modeling temporal and spatial context through a GRU-based temporal encoder and a multi-head attention spatial encoder, while mitigating emotion-shift via a curriculum learning loss. The difficulty is quantified using emotion-shift frequency within conversations, guiding epoch-dependent sample weighting that progressively exposes harder cases. Empirical results on MELD, IEMOCAP, EmoryNLP, and DailyDialog show ERNetCL achieves superior or competitive performance, with ablations confirming the benefits of TE, SE, and CL. The approach offers a lightweight yet effective alternative to complex ERC architectures and suggests promising directions for multimodal and contrastive learning in conversation understanding.
Abstract
Emotion recognition in conversation (ERC) has emerged as a research hotspot in domains such as conversational robots and question-answer systems. How to efficiently and adequately retrieve contextual emotional cues has been one of the key challenges in the ERC task. Existing efforts do not fully model the context and employ complex network structures, resulting in limited performance gains. In this paper, we propose a novel emotion recognition network based on curriculum learning strategy (ERNetCL). The proposed ERNetCL primarily consists of temporal encoder (TE), spatial encoder (SE), and curriculum learning (CL) loss. We utilize TE and SE to combine the strengths of previous methods in a simplistic manner to efficiently capture temporal and spatial contextual information in the conversation. To ease the harmful influence resulting from emotion shift and simulate the way humans learn curriculum from easy to hard, we apply the idea of CL to the ERC task to progressively optimize the network parameters. At the beginning of training, we assign lower learning weights to difficult samples. As the epoch increases, the learning weights for these samples are gradually raised. Extensive experiments on four datasets exhibit that our proposed method is effective and dramatically beats other baseline models.
