Table of Contents
Fetching ...

Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

Barbara Gendron, Gaël Guibon

TL;DR

This paper tackles Emotion Recognition in Conversation (ERC) by modeling conversational context within a metric-learning framework. It proposes SentEmoContext, a lightweight Siamese-network-based approach that combines contextual representations from pre-trained sentence transformers with a cross-entropy objective and a triplet loss, enabling robust emotion classification across label granularities. On DailyDialog, it achieves a macro-F1 of $57.71$ and a micro-F1 of $57.75$, outperforming several state-of-the-art methods and even open-source LLM prompts in the macro metric, while maintaining efficiency. The work highlights effective handling of label imbalance and demonstrates the practicality of context-aware metric learning for adaptable ERC systems.

Abstract

The advent of deep learning models has made a considerable contribution to the achievement of Emotion Recognition in Conversation (ERC). However, this task still remains an important challenge due to the plurality and subjectivity of human emotions. Previous work on ERC provides predictive models using mostly graph-based conversation representations. In this work, we propose a way to model the conversational context that we incorporate into a metric learning training strategy, with a two-step process. This allows us to perform ERC in a flexible classification scenario and to end up with a lightweight yet efficient model. Using metric learning through a Siamese Network architecture, we achieve 57.71 in macro F1 score for emotion classification in conversation on DailyDialog dataset, which outperforms the related work. This state-of-the-art result is promising regarding the use of metric learning for emotion recognition, yet perfectible compared to the microF1 score obtained.

Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

TL;DR

This paper tackles Emotion Recognition in Conversation (ERC) by modeling conversational context within a metric-learning framework. It proposes SentEmoContext, a lightweight Siamese-network-based approach that combines contextual representations from pre-trained sentence transformers with a cross-entropy objective and a triplet loss, enabling robust emotion classification across label granularities. On DailyDialog, it achieves a macro-F1 of and a micro-F1 of , outperforming several state-of-the-art methods and even open-source LLM prompts in the macro metric, while maintaining efficiency. The work highlights effective handling of label imbalance and demonstrates the practicality of context-aware metric learning for adaptable ERC systems.

Abstract

The advent of deep learning models has made a considerable contribution to the achievement of Emotion Recognition in Conversation (ERC). However, this task still remains an important challenge due to the plurality and subjectivity of human emotions. Previous work on ERC provides predictive models using mostly graph-based conversation representations. In this work, we propose a way to model the conversational context that we incorporate into a metric learning training strategy, with a two-step process. This allows us to perform ERC in a flexible classification scenario and to end up with a lightweight yet efficient model. Using metric learning through a Siamese Network architecture, we achieve 57.71 in macro F1 score for emotion classification in conversation on DailyDialog dataset, which outperforms the related work. This state-of-the-art result is promising regarding the use of metric learning for emotion recognition, yet perfectible compared to the microF1 score obtained.
Paper Structure (24 sections, 2 equations, 4 figures, 5 tables)

This paper contains 24 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of the triplet loss principle. Given a triplet $(A. P. N)$ corresponding to respectively anchor. positive and negative. the positive sample should be closer to the anchor than the negative sample in order to minimize the triplet loss.
  • Figure 2: Illustration of the three main steps of the training procedure in the case of conversation-aware emotion predictions. Both losses (CE and triplet) backpropagate in order to gradually improve the encoder.
  • Figure 3: Prompts for llama and falcon
  • Figure 4: Histograms of only the emotion label distribution in DailyDialog subsets.