Distribution-based Emotion Recognition in Conversation
Wen Wu, Chao Zhang, Philip C. Woodland
TL;DR
Distribution-based ERC frames each utterance as a probability distribution over emotion classes and models dialogue as a sequence of distributions. It combines a dialogue-level Transformer with utterance-specific Dirichlet priors via a Dirichlet Prior Network (DPN) and utilizes SSL-based audio-text representations for robust multi-modal cues. The approach enables using all utterances, improves uncertainty estimation (AUPR) and achieves higher accuracy than single-utterance baselines on IEMOCAP. This work advances emotion-aware conversational AI by explicitly modelling uncertainty and cross-utterance dynamics.
Abstract
Automatic emotion recognition in conversation (ERC) is crucial for emotion-aware conversational artificial intelligence. This paper proposes a distribution-based framework that formulates ERC as a sequence-to-sequence problem for emotion distribution estimation. The inherent ambiguity of emotions and the subjectivity of human perception lead to disagreements in emotion labels, which is handled naturally in our framework from the perspective of uncertainty estimation in emotion distributions. A Bayesian training loss is introduced to improve the uncertainty estimation by conditioning each emotional state on an utterance-specific Dirichlet prior distribution. Experimental results on the IEMOCAP dataset show that ERC outperformed the single-utterance-based system, and the proposed distribution-based ERC methods have not only better classification accuracy, but also show improved uncertainty estimation.
