IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings
Shubham Patel, Divyaksh Shukla, Ashutosh Modi
TL;DR
The paper tackles Emotion Recognition in Conversations (ERC) and Emotion Flip Reasoning (EFR) by integrating speaker embeddings and a Probable Trigger Zone (PTZ) to mitigate data skew. It combines a masked memory network with speaker-aware features for ERC and a transformer-based, emotion-aware EFR model, leveraging HingBERT for code-mixed data and voyage embeddings for English inputs. Sub-task 3 achieves a notable 5.9 F1-point improvement over the baseline, while sub-task 2 shows mixed gains, and sub-task 1 remains challenged by label skew. Limitations include dependence on known speakers and training time, with future work proposing learnable speaker embeddings and better skew handling to enhance robustness in real-world settings.
Abstract
This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the conversation that is more likely to contain the utterances causing the emotion to flip. For sub-task 3, the proposed approach achieves a 5.9 (F1 score) improvement over the task baseline. The ablation study results highlight the significance of various design choices in the proposed method.
