Transformer based neural networks for emotion recognition in conversations
Claudiu Creanga, Liviu P. Dinu
TL;DR
The paper tackles emotion recognition in multilingual conversations (English and Hindi) within SemEval-2024 Task 10, comparing Masked Language Modelling (MLM) using multilingual BERT-like encoders against Causal Language Modelling (CLM) with Mistral 7B Instruct. MLM-based fine-tuning achieved stronger sentence-level emotion classification than CLM, with the top Subtask 1 result around a macro F1 of 0.43, placing 12th. Key findings include the impact of input length, representation from the final transformer layer, and a carefully staged fine-tuning schedule to mitigate overfitting; error analysis highlights class imbalance as a major challenge. The work suggests future directions, such as hybrid transformer-LSTM architectures and multi-turn context, and points to newer causal models that might close the gap in this task, with code released for reproducibility.
Abstract
This paper outlines the approach of the ISDS-NLP team in the SemEval 2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversation (EDiReF). For Subtask 1 we obtained a weighted F1 score of 0.43 and placed 12 in the leaderboard. We investigate two distinct approaches: Masked Language Modeling (MLM) and Causal Language Modeling (CLM). For MLM, we employ pre-trained BERT-like models in a multilingual setting, fine-tuning them with a classifier to predict emotions. Experiments with varying input lengths, classifier architectures, and fine-tuning strategies demonstrate the effectiveness of this approach. Additionally, we utilize Mistral 7B Instruct V0.2, a state-of-the-art model, applying zero-shot and few-shot prompting techniques. Our findings indicate that while Mistral shows promise, MLMs currently outperform them in sentence-level emotion classification.
