TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation
Junya Ono, Hiromi Wakaki
TL;DR
This paper addresses emotion recognition in conversations by modeling multi-turn context with explicit turn emphasis. It introduces Turn Emphasis with Dialogue (TED), a framework composed of Turn-Based Encoding (TBE), Turn-Based Multi-Head Self-Attention (TBM), and a dialogue layer that uses turn priority and speaker information to adjust attention, enabling stronger emphasis on the current turn. TED is evaluated on four ERC datasets, achieving strong overall performance and state-of-the-art results on IEMOCAP with many turns, while ablation studies highlight the effectiveness of past-context encoding (TBE) and turn-based attention (TBM) with dialogue features. The approach offers a principled way to fuse dialogue cues into attention mechanisms, improving robustness across datasets with varying turn distributions and speaker dynamics, and suggesting that explicit turn-distinction is beneficial for ERC in real-world conversations.
Abstract
Emotion recognition in conversation (ERC) has been attracting attention by methods for modeling multi-turn contexts. The multi-turn input to a pretraining model implicitly assumes that the current turn and other turns are distinguished during the training process by inserting special tokens into the input sequence. This paper proposes a priority-based attention method to distinguish each turn explicitly by adding dialogue features into the attention mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each turn according to turn position and speaker information as dialogue features. It takes multi-head self-attention between turn-based vectors for multi-turn input and adjusts attention scores with the dialogue features. We evaluate TED on four typical benchmarks. The experimental results demonstrate that TED has high overall performance in all datasets and achieves state-of-the-art performance on IEMOCAP with numerous turns.
