LaERC-S: Improving LLM-based Emotion Recognition in Conversation with Speaker Characteristics
Yumeng Fu, Junjie Wu, Zhongjie Wang, Meishan Zhang, Lili Shan, Yulin Wu, Bingquan Li
TL;DR
LaERC-S addresses emotion recognition in conversation by leveraging large language models to extract dynamic speaker characteristics (mental state, behavior, persona) and a two-stage learning process to inject this knowledge for robust emotion prediction. The framework uses carefully designed prompts and templates to elicit targeted speaker cues, then trains with an instruction-tuning objective that culminates in emotion classification guided by an explicit $oReact$ cue, formalized as $L_k = \sum_{i'}^{j} - \mathrm{log}P(\mu_{(k,i')}|x_k,\theta_k)$. Evaluations on IEMOCAP, MELD, and EmoryNLP show state-of-the-art weighted-F1 scores with strong ablations demonstrating the value of speaker characteristics and the two-stage approach; results are robust across datasets and model variations. The work highlights how integrating world-knowledge-driven, dynamic speaker information into ERC can yield more accurate and generalizable emotion understanding in conversations, with practical efficiency on a single GPU. Future directions include exploring richer expressions of speaker characteristics and extending the approach to additional NLP tasks that benefit from nuanced speaker modeling.
Abstract
Emotion recognition in conversation (ERC), the task of discerning human emotions for each utterance within a conversation, has garnered significant attention in human-computer interaction systems. Previous ERC studies focus on speaker-specific information that predominantly stems from relationships among utterances, which lacks sufficient information around conversations. Recent research in ERC has sought to exploit pre-trained large language models (LLMs) with speaker modelling to comprehend emotional states. Although these methods have achieved encouraging results, the extracted speaker-specific information struggles to indicate emotional dynamics. In this paper, motivated by the fact that speaker characteristics play a crucial role and LLMs have rich world knowledge, we present LaERC-S, a novel framework that stimulates LLMs to explore speaker characteristics involving the mental state and behavior of interlocutors, for accurate emotion predictions. To endow LLMs with this knowledge information, we adopt the two-stage learning to make the models reason speaker characteristics and track the emotion of the speaker in complex conversation scenarios. Extensive experiments on three benchmark datasets demonstrate the superiority of LaERC-S, reaching the new state-of-the-art.
