Table of Contents
Fetching ...

EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa

Taewoon Kim, Piek Vossen

TL;DR

EmoBERTa addresses emotion recognition in conversation by leveraging a text-only modality with speaker-aware input. By prepending speaker names and using a three-segment RoBERTa input (past, current, future), it captures intra- and inter-speaker context in an end-to-end manner. The approach achieves state-of-the-art performance on MELD and IEMOCAP, with ablations showing the value of speaker cues and contextual segments, complemented by qualitative attention analyses for interpretability. The method is simple, effective, and publicly released for replication and extension, with potential for multimodal integration in future work.

Abstract

We present EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa, a simple yet expressive scheme of solving the ERC (emotion recognition in conversation) task. By simply prepending speaker names to utterances and inserting separation tokens between the utterances in a dialogue, EmoBERTa can learn intra- and inter- speaker states and context to predict the emotion of a current speaker, in an end-to-end manner. Our experiments show that we reach a new state of the art on the two popular ERC datasets using a basic and straight-forward approach. We've open sourced our code and models at https://github.com/tae898/erc.

EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa

TL;DR

EmoBERTa addresses emotion recognition in conversation by leveraging a text-only modality with speaker-aware input. By prepending speaker names and using a three-segment RoBERTa input (past, current, future), it captures intra- and inter-speaker context in an end-to-end manner. The approach achieves state-of-the-art performance on MELD and IEMOCAP, with ablations showing the value of speaker cues and contextual segments, complemented by qualitative attention analyses for interpretability. The method is simple, effective, and publicly released for replication and extension, with potential for multimodal integration in future work.

Abstract

We present EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa, a simple yet expressive scheme of solving the ERC (emotion recognition in conversation) task. By simply prepending speaker names to utterances and inserting separation tokens between the utterances in a dialogue, EmoBERTa can learn intra- and inter- speaker states and context to predict the emotion of a current speaker, in an end-to-end manner. Our experiments show that we reach a new state of the art on the two popular ERC datasets using a basic and straight-forward approach. We've open sourced our code and models at https://github.com/tae898/erc.

Paper Structure

This paper contains 17 sections, 1 equation, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Two examples from the 20 randomly selected test samples are shown. The current speaker utterance, of which the emotion that the model has to predict, is in bold. The green highlighted tokens are the top 10 most attended tokens to the current speaker (i.e., JOEY) in the beginning layer of the model. The yellow highlighted tokens are the top 10 most attended tokens to the [CLS] token (i.e., <s>) in the last layer.. Best viewed when zoomed in.
  • Figure 2: Two examples from the 20 randomly selected test samples are shown. The current speaker utterance, of which the emotion that the model has to predict, is in bold. The green highlighted tokens are the top 10 most attended tokens to the current speaker (i.e., WILLIAM and ELIZABETH, for Figure \ref{['fig:pred-excited_truth-excited']} and \ref{['fig:pred-neutral_truth-excited']}, respectively.) in the beginning layer of the model. The yellow highlighted tokens are the top 10 most attended tokens to the [CLS] token (i.e. <s>) in the last layer. Unlike Figure \ref{['fig:qualitative-analysis']}, there is only one [SEP] token (i.e., </s></s>), since this model only has two segments, past and current. Best viewed when zoomed in