EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa
Taewoon Kim, Piek Vossen
TL;DR
EmoBERTa addresses emotion recognition in conversation by leveraging a text-only modality with speaker-aware input. By prepending speaker names and using a three-segment RoBERTa input (past, current, future), it captures intra- and inter-speaker context in an end-to-end manner. The approach achieves state-of-the-art performance on MELD and IEMOCAP, with ablations showing the value of speaker cues and contextual segments, complemented by qualitative attention analyses for interpretability. The method is simple, effective, and publicly released for replication and extension, with potential for multimodal integration in future work.
Abstract
We present EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa, a simple yet expressive scheme of solving the ERC (emotion recognition in conversation) task. By simply prepending speaker names to utterances and inserting separation tokens between the utterances in a dialogue, EmoBERTa can learn intra- and inter- speaker states and context to predict the emotion of a current speaker, in an end-to-end manner. Our experiments show that we reach a new state of the art on the two popular ERC datasets using a basic and straight-forward approach. We've open sourced our code and models at https://github.com/tae898/erc.
