Table of Contents
Fetching ...

Learning Annotation Consensus for Continuous Emotion Recognition

Ibrahim Shoer, Engin Erzin

TL;DR

The paper addresses the subjectivity in emotion annotations by proposing a multi-annotator learning framework that preserves annotator diversity through an Annotators Consensus Network (ACN). A wav2vec 2.0-based CER (W2V-CER) is augmented with ACN to produce a learned consensus $\bar{y}$, supervised via a dual CCC-based loss $L_{CER-ACN} = \alpha L_{CCC}(y, \bar{y}) + \beta L_{CCC}(\bar{y}, \hat{y})$, guiding predictions toward collective input. Empirical results on RECOLA and COGNIMUSE show significant improvements in CCC for valence and arousal, particularly for valence with joint training, demonstrating that preserving annotator variability yields more robust affective models. This consensus-aware approach offers broader applicability to any domain with abundant, inconsistent annotations and supports real-time, multimodal emotion-aware systems.

Abstract

In affective computing, datasets often contain multiple annotations from different annotators, which may lack full agreement. Typically, these annotations are merged into a single gold standard label, potentially losing valuable inter-rater variability. We propose a multi-annotator training approach for continuous emotion recognition (CER) that seeks a consensus across all annotators rather than relying on a single reference label. Our method employs a consensus network to aggregate annotations into a unified representation, guiding the main arousal-valence predictor to better reflect collective inputs. Tested on the RECOLA and COGNIMUSE datasets, our approach outperforms traditional methods that unify annotations into a single label. This underscores the benefits of fully leveraging multi-annotator data in emotion recognition and highlights its applicability across various fields where annotations are abundant yet inconsistent.

Learning Annotation Consensus for Continuous Emotion Recognition

TL;DR

The paper addresses the subjectivity in emotion annotations by proposing a multi-annotator learning framework that preserves annotator diversity through an Annotators Consensus Network (ACN). A wav2vec 2.0-based CER (W2V-CER) is augmented with ACN to produce a learned consensus , supervised via a dual CCC-based loss , guiding predictions toward collective input. Empirical results on RECOLA and COGNIMUSE show significant improvements in CCC for valence and arousal, particularly for valence with joint training, demonstrating that preserving annotator variability yields more robust affective models. This consensus-aware approach offers broader applicability to any domain with abundant, inconsistent annotations and supports real-time, multimodal emotion-aware systems.

Abstract

In affective computing, datasets often contain multiple annotations from different annotators, which may lack full agreement. Typically, these annotations are merged into a single gold standard label, potentially losing valuable inter-rater variability. We propose a multi-annotator training approach for continuous emotion recognition (CER) that seeks a consensus across all annotators rather than relying on a single reference label. Our method employs a consensus network to aggregate annotations into a unified representation, guiding the main arousal-valence predictor to better reflect collective inputs. Tested on the RECOLA and COGNIMUSE datasets, our approach outperforms traditional methods that unify annotations into a single label. This underscores the benefits of fully leveraging multi-annotator data in emotion recognition and highlights its applicability across various fields where annotations are abundant yet inconsistent.

Paper Structure

This paper contains 10 sections, 2 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: The architecture of the continuous emotion recognition network via learning annotation consensus consists of two integrated components: A) the baseline W2V-CER network, and B) the annotation consensus network (ACN), which incorporates multi-annotator learning.