AIMA at SemEval-2024 Task 10: History-Based Emotion Recognition in Hindi-English Code-Mixed Conversations
Mohammad Mahdi Abootorabi, Nona Ghazizadeh, Seyed Arshan Dalili, Alireza Ghahramani Kure, Mahshid Dehghani, Ehsaneddin Asgari
TL;DR
This work tackles emotion recognition in code-mixed Hindi-English conversations (MaSaC dataset) by building four base models around a RoBERTa-based encoder fine-tuned on GoEmotions and leveraging both prior and future conversational context. A two-step Hinglish-to-English translation pipeline (indic-trans to Hindi, then SeamlessM4T) enables processing of code-mixed input, while three architectures plus a data-augmented variant form four base models that are ensembled by majority voting. The Context-Aware GRU-Based model, which uses both preceding and following utterances and omits previous emotion to avoid error propagation, delivers the strongest single-model performance, with the ensemble achieving Weighted F1 of $0.4080$ on the test set, outperforming GPT-3.5 Turbo and LaBSE-based baselines. The study demonstrates the value of including historical and future context and through a two-step translation pipeline, provides a practical approach to code-mixed ERC with potential for extension to end-to-end and multimodal setups. Overall, this work advances code-mixed ERC by combining robust encoders, context modeling, and ensemble strategies, offering a foundation for more scalable emotion-aware dialogue systems in multilingual settings.
Abstract
In this study, we introduce a solution to the SemEval 2024 Task 10 on subtask 1, dedicated to Emotion Recognition in Conversation (ERC) in code-mixed Hindi-English conversations. ERC in code-mixed conversations presents unique challenges, as existing models are typically trained on monolingual datasets and may not perform well on code-mixed data. To address this, we propose a series of models that incorporate both the previous and future context of the current utterance, as well as the sequential information of the conversation. To facilitate the processing of code-mixed data, we developed a Hinglish-to-English translation pipeline to translate the code-mixed conversations into English. We designed four different base models, each utilizing powerful pre-trained encoders to extract features from the input but with varying architectures. By ensembling all of these models, we developed a final model that outperforms all other baselines.
