EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

Yingjian Liu; Jiang Li; Xiaoping Wang; Zhigang Zeng

EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

Yingjian Liu, Jiang Li, Xiaoping Wang, Zhigang Zeng

TL;DR

Compared to previous ERC models, EmotionIC can model a conversation more thoroughly at both the feature-extraction and classification levels, and can significantly outperform the state-of-the-art models on four benchmark datasets.

Abstract

Emotion Recognition in Conversation (ERC) has attracted growing attention in recent years as a result of the advancement and implementation of human-computer interface technologies. In this paper, we propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task. Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU), and Skip-chain Conditional Random Field (SkipCRF). Compared to previous ERC models, EmotionIC can model a conversation more thoroughly at both the feature-extraction and classification levels. The proposed model attempts to integrate the advantages of attention- and recurrence-based methods at the feature-extraction level. Specifically, IMMHA is applied to capture identity-based global contextual dependencies, while DiaGRU is utilized to extract speaker- and temporal-aware local contextual information. At the classification level, SkipCRF can explicitly mine complex emotional flows from higher-order neighboring utterances in the conversation. Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets. The ablation studies confirm that our modules can effectively model emotional inertia and contagion.

EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 10 figures, 7 tables)

This paper contains 21 sections, 12 equations, 10 figures, 7 tables.

Introduction
Related work
Emotion recognition in conversation
Conditional random field
Our approach
Preliminaries
Identity masked multi-head attention
Dialogue-based gated recurrent unit
Skip-chain conditional random field
Experimental settings
Evaluation metrics and datasets
Implementation details
Results and analysis
Comparison with baseline methods
Analysis for confusion matrices
...and 6 more sections

Figures (10)

Figure 1: Example of contextual dependency modeling at both the feature-extraction and classification levels. The dashed and solid line represent the intra- and inter-speaker information transmissions, respectively.
Figure 2: Architecture of our EmotionIC. Firstly, the global and local context dependencies are extracted through IMMHA and DiaGRU, respectively. Then, we concatenate global context features, local context features and utterance features. Note that combining utterance features that are not processed by IMMHA and DiaGRU is to prevent the original semantics of utterances with weak context dependencies from being obscured. Finally, the emotional flows in the conversation are captured at the classification level through SkipCRF to obtain the optimal emotion sequence.
Figure 3: Illustration of utterance moment function. $u_{t-2}^{p_i}$ is the nearest and previous utterance uttered by the speaker $p_i$, so it can also be denoted by $u_{\bm{s}(t)}^{p_i}$. Similarly, $u_{t-1}^{p_j}$ can be also denoted by $u_{\bm{o}(t)}^{p_j}$.
Figure 4: Network structure of IMMHA. $M_s$ and $M_o$ are two mask matrices that mask contextual dependencies from other participants and the current participant, respectively, as well as future information. $\mathbf{MatMul}$ indicates the matrix multiplication operation, and $\mathbf{Mask}$ is the masking operation.
Figure 5: Illustration of DiaGRU. (a) A single DiaGRU cell. Here, $x^{p_i}_{t}$, $h^{p_i}_{t}$, $h^{p_i}_{\bm{s}(t)}$, and $h^{p_j}_{\bm{o}(t)}$ are the vector representation of the utterance $u_t^{p_i}$, the hidden state of the current moment, the emotional hidden state of the speaker $p_i$, and the hidden state of the corresponding interlocutor $p_j$, respectively. (b) An example of DiaGRU. The dotted and solid lines represent self- and other-dependency, respectively, and the thickness indicates the strength of dependency adding decay factor.
...and 5 more figures

EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

TL;DR

Abstract

EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)