Table of Contents
Fetching ...

TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation

Junya Ono, Hiromi Wakaki

TL;DR

This paper addresses emotion recognition in conversations by modeling multi-turn context with explicit turn emphasis. It introduces Turn Emphasis with Dialogue (TED), a framework composed of Turn-Based Encoding (TBE), Turn-Based Multi-Head Self-Attention (TBM), and a dialogue layer that uses turn priority and speaker information to adjust attention, enabling stronger emphasis on the current turn. TED is evaluated on four ERC datasets, achieving strong overall performance and state-of-the-art results on IEMOCAP with many turns, while ablation studies highlight the effectiveness of past-context encoding (TBE) and turn-based attention (TBM) with dialogue features. The approach offers a principled way to fuse dialogue cues into attention mechanisms, improving robustness across datasets with varying turn distributions and speaker dynamics, and suggesting that explicit turn-distinction is beneficial for ERC in real-world conversations.

Abstract

Emotion recognition in conversation (ERC) has been attracting attention by methods for modeling multi-turn contexts. The multi-turn input to a pretraining model implicitly assumes that the current turn and other turns are distinguished during the training process by inserting special tokens into the input sequence. This paper proposes a priority-based attention method to distinguish each turn explicitly by adding dialogue features into the attention mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each turn according to turn position and speaker information as dialogue features. It takes multi-head self-attention between turn-based vectors for multi-turn input and adjusts attention scores with the dialogue features. We evaluate TED on four typical benchmarks. The experimental results demonstrate that TED has high overall performance in all datasets and achieves state-of-the-art performance on IEMOCAP with numerous turns.

TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation

TL;DR

This paper addresses emotion recognition in conversations by modeling multi-turn context with explicit turn emphasis. It introduces Turn Emphasis with Dialogue (TED), a framework composed of Turn-Based Encoding (TBE), Turn-Based Multi-Head Self-Attention (TBM), and a dialogue layer that uses turn priority and speaker information to adjust attention, enabling stronger emphasis on the current turn. TED is evaluated on four ERC datasets, achieving strong overall performance and state-of-the-art results on IEMOCAP with many turns, while ablation studies highlight the effectiveness of past-context encoding (TBE) and turn-based attention (TBM) with dialogue features. The approach offers a principled way to fuse dialogue cues into attention mechanisms, improving robustness across datasets with varying turn distributions and speaker dynamics, and suggesting that explicit turn-distinction is beneficial for ERC in real-world conversations.

Abstract

Emotion recognition in conversation (ERC) has been attracting attention by methods for modeling multi-turn contexts. The multi-turn input to a pretraining model implicitly assumes that the current turn and other turns are distinguished during the training process by inserting special tokens into the input sequence. This paper proposes a priority-based attention method to distinguish each turn explicitly by adding dialogue features into the attention mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each turn according to turn position and speaker information as dialogue features. It takes multi-head self-attention between turn-based vectors for multi-turn input and adjusts attention scores with the dialogue features. We evaluate TED on four typical benchmarks. The experimental results demonstrate that TED has high overall performance in all datasets and achieves state-of-the-art performance on IEMOCAP with numerous turns.
Paper Structure (27 sections, 11 equations, 6 figures, 8 tables)

This paper contains 27 sections, 11 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Concept of our method. This shows the difference in the multi-turn input in related methods. Our method prioritizes each turn to distinguish the turns
  • Figure 2: Example of ERC and dialogue features. TED uses current (C), past (P) and future (F) turns to obtain more contexts. Dialogue features indicate turn priority, same speaker (S) and listener (L). In this case, TED adjusts attention scores $^{}$for the same speakers (S) in the current turn by using an attention factor $\beta^{t}$ with the turn priority; ${t}$ indicates the turn number; SID indicates the speaker identification.
  • Figure 3: Turn-based encoding (TBE) model. CUST outputs a multi-turn sequence from utterances in past and future turns with "TURN" and "SEP" tokens as separators. TBE uses a current turn-based vector$\ {\widetilde{H}}^{c}$ created by averaging token-based vectors.
  • Figure 4: Turn-based MHSA (TBM) model. TBM establishes MHSA between turn-based vectors to obtain more contexts based on TBE.
  • Figure 5: The proposed model, TED, has a dialogue layer to adjust attention scores by using turn priority and speaker IDs at the last (${N}$th) layer on the basis of TBM.
  • ...and 1 more figures