Table of Contents
Fetching ...

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

Shubham Patel, Divyaksh Shukla, Ashutosh Modi

TL;DR

The paper tackles Emotion Recognition in Conversations (ERC) and Emotion Flip Reasoning (EFR) by integrating speaker embeddings and a Probable Trigger Zone (PTZ) to mitigate data skew. It combines a masked memory network with speaker-aware features for ERC and a transformer-based, emotion-aware EFR model, leveraging HingBERT for code-mixed data and voyage embeddings for English inputs. Sub-task 3 achieves a notable 5.9 F1-point improvement over the baseline, while sub-task 2 shows mixed gains, and sub-task 1 remains challenged by label skew. Limitations include dependence on known speakers and training time, with future work proposing learnable speaker embeddings and better skew handling to enhance robustness in real-world settings.

Abstract

This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the conversation that is more likely to contain the utterances causing the emotion to flip. For sub-task 3, the proposed approach achieves a 5.9 (F1 score) improvement over the task baseline. The ablation study results highlight the significance of various design choices in the proposed method.

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

TL;DR

The paper tackles Emotion Recognition in Conversations (ERC) and Emotion Flip Reasoning (EFR) by integrating speaker embeddings and a Probable Trigger Zone (PTZ) to mitigate data skew. It combines a masked memory network with speaker-aware features for ERC and a transformer-based, emotion-aware EFR model, leveraging HingBERT for code-mixed data and voyage embeddings for English inputs. Sub-task 3 achieves a notable 5.9 F1-point improvement over the baseline, while sub-task 2 shows mixed gains, and sub-task 1 remains challenged by label skew. Limitations include dependence on known speakers and training time, with future work proposing learnable speaker embeddings and better skew handling to enhance robustness in real-world settings.

Abstract

This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the conversation that is more likely to contain the utterances causing the emotion to flip. For sub-task 3, the proposed approach achieves a 5.9 (F1 score) improvement over the task baseline. The ablation study results highlight the significance of various design choices in the proposed method.
Paper Structure (23 sections, 7 equations, 6 figures, 9 tables)

This paper contains 23 sections, 7 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Distribution of the distance between the target utterance and the causal utterance for emotion flip.
  • Figure 2: Masked Memory Network with Speaker-Embeddings concatenated with utterance embeddings. Speaker-embeddings are one-hot vectors of 6-dimensions which store 1 at the index of the top-6 speakers, otherwise 0.
  • Figure 3: Probable Trigger Zone.
  • Figure 4: Architecture of the model proposed for the task of Emotion Flip Recognition.
  • Figure 5: Confusion Matrix for Sub-Task 1.
  • ...and 1 more figures