IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

Shubham Patel; Divyaksh Shukla; Ashutosh Modi

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

Shubham Patel, Divyaksh Shukla, Ashutosh Modi

TL;DR

The paper tackles Emotion Recognition in Conversations (ERC) and Emotion Flip Reasoning (EFR) by integrating speaker embeddings and a Probable Trigger Zone (PTZ) to mitigate data skew. It combines a masked memory network with speaker-aware features for ERC and a transformer-based, emotion-aware EFR model, leveraging HingBERT for code-mixed data and voyage embeddings for English inputs. Sub-task 3 achieves a notable 5.9 F1-point improvement over the baseline, while sub-task 2 shows mixed gains, and sub-task 1 remains challenged by label skew. Limitations include dependence on known speakers and training time, with future work proposing learnable speaker embeddings and better skew handling to enhance robustness in real-world settings.

Abstract

This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the conversation that is more likely to contain the utterances causing the emotion to flip. For sub-task 3, the proposed approach achieves a 5.9 (F1 score) improvement over the task baseline. The ablation study results highlight the significance of various design choices in the proposed method.

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

TL;DR

Abstract

Paper Structure (23 sections, 7 equations, 6 figures, 9 tables)

This paper contains 23 sections, 7 equations, 6 figures, 9 tables.

Introduction
Related Work
ERC
EFR
Embeddings
Task
System overview
Utterance Embeddings
ERC
EFR
Baseline
Speaker-Aware Embeddings
Probable Trigger Zone (PTZ)
Emotion-Aware Embeddings
Model Functioning
...and 8 more sections

Figures (6)

Figure 1: Distribution of the distance between the target utterance and the causal utterance for emotion flip.
Figure 2: Masked Memory Network with Speaker-Embeddings concatenated with utterance embeddings. Speaker-embeddings are one-hot vectors of 6-dimensions which store 1 at the index of the top-6 speakers, otherwise 0.
Figure 3: Probable Trigger Zone.
Figure 4: Architecture of the model proposed for the task of Emotion Flip Recognition.
Figure 5: Confusion Matrix for Sub-Task 1.
...and 1 more figures

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

TL;DR

Abstract

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

Authors

TL;DR

Abstract

Table of Contents

Figures (6)