Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

Barbara Gendron; Gaël Guibon

Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

Barbara Gendron, Gaël Guibon

TL;DR

This paper tackles Emotion Recognition in Conversation (ERC) by modeling conversational context within a metric-learning framework. It proposes SentEmoContext, a lightweight Siamese-network-based approach that combines contextual representations from pre-trained sentence transformers with a cross-entropy objective and a triplet loss, enabling robust emotion classification across label granularities. On DailyDialog, it achieves a macro-F1 of $57.71$ and a micro-F1 of $57.75$, outperforming several state-of-the-art methods and even open-source LLM prompts in the macro metric, while maintaining efficiency. The work highlights effective handling of label imbalance and demonstrates the practicality of context-aware metric learning for adaptable ERC systems.

Abstract

The advent of deep learning models has made a considerable contribution to the achievement of Emotion Recognition in Conversation (ERC). However, this task still remains an important challenge due to the plurality and subjectivity of human emotions. Previous work on ERC provides predictive models using mostly graph-based conversation representations. In this work, we propose a way to model the conversational context that we incorporate into a metric learning training strategy, with a two-step process. This allows us to perform ERC in a flexible classification scenario and to end up with a lightweight yet efficient model. Using metric learning through a Siamese Network architecture, we achieve 57.71 in macro F1 score for emotion classification in conversation on DailyDialog dataset, which outperforms the related work. This state-of-the-art result is promising regarding the use of metric learning for emotion recognition, yet perfectible compared to the microF1 score obtained.

Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

TL;DR

and a micro-F1 of

, outperforming several state-of-the-art methods and even open-source LLM prompts in the macro metric, while maintaining efficiency. The work highlights effective handling of label imbalance and demonstrates the practicality of context-aware metric learning for adaptable ERC systems.

Abstract

Paper Structure (24 sections, 2 equations, 4 figures, 5 tables)

This paper contains 24 sections, 2 equations, 4 figures, 5 tables.

Introduction
Related Work
ERC.
Metric learning.
Methodology
Isolated representations.
Contextual representations.
Experimental Protocol
Data.
Model specificities.
Training specificities.
Evaluation.
Comparison with LLMs.
Results
Comparison with Emotion Classifiers on Utterance Level
...and 9 more sections

Figures (4)

Figure 1: Illustration of the triplet loss principle. Given a triplet $(A. P. N)$ corresponding to respectively anchor. positive and negative. the positive sample should be closer to the anchor than the negative sample in order to minimize the triplet loss.
Figure 2: Illustration of the three main steps of the training procedure in the case of conversation-aware emotion predictions. Both losses (CE and triplet) backpropagate in order to gradually improve the encoder.
Figure 3: Prompts for llama and falcon
Figure 4: Histograms of only the emotion label distribution in DailyDialog subsets.

Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

TL;DR

Abstract

Context-Aware Siamese Networks for Efficient Emotion Recognition in Conversation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)