Table of Contents
Fetching ...

Leveraging Label Potential for Enhanced Multimodal Emotion Recognition

Xuechun Shao, Yinfeng Yu, Liejun Wang

TL;DR

A novel model called Label Signal-Guided Multimodal Emotion Recognition (LSGMER) is introduced, which aims to fully harness the power of emotion label information to boost the classification accuracy and stability of MER.

Abstract

Multimodal emotion recognition (MER) seeks to integrate various modalities to predict emotional states accurately. However, most current research focuses solely on the fusion of audio and text features, overlooking the valuable information in emotion labels. This oversight could potentially hinder the performance of existing methods, as emotion labels harbor rich, insightful information that could significantly aid MER. We introduce a novel model called Label Signal-Guided Multimodal Emotion Recognition (LSGMER) to overcome this limitation. This model aims to fully harness the power of emotion label information to boost the classification accuracy and stability of MER. Specifically, LSGMER employs a Label Signal Enhancement module that optimizes the representation of modality features by interacting with audio and text features through label embeddings, enabling it to capture the nuances of emotions precisely. Furthermore, we propose a Joint Objective Optimization(JOO) approach to enhance classification accuracy by introducing the Attribution-Prediction Consistency Constraint (APC), which strengthens the alignment between fused features and emotion categories. Extensive experiments conducted on the IEMOCAP and MELD datasets have demonstrated the effectiveness of our proposed LSGMER model.

Leveraging Label Potential for Enhanced Multimodal Emotion Recognition

TL;DR

A novel model called Label Signal-Guided Multimodal Emotion Recognition (LSGMER) is introduced, which aims to fully harness the power of emotion label information to boost the classification accuracy and stability of MER.

Abstract

Multimodal emotion recognition (MER) seeks to integrate various modalities to predict emotional states accurately. However, most current research focuses solely on the fusion of audio and text features, overlooking the valuable information in emotion labels. This oversight could potentially hinder the performance of existing methods, as emotion labels harbor rich, insightful information that could significantly aid MER. We introduce a novel model called Label Signal-Guided Multimodal Emotion Recognition (LSGMER) to overcome this limitation. This model aims to fully harness the power of emotion label information to boost the classification accuracy and stability of MER. Specifically, LSGMER employs a Label Signal Enhancement module that optimizes the representation of modality features by interacting with audio and text features through label embeddings, enabling it to capture the nuances of emotions precisely. Furthermore, we propose a Joint Objective Optimization(JOO) approach to enhance classification accuracy by introducing the Attribution-Prediction Consistency Constraint (APC), which strengthens the alignment between fused features and emotion categories. Extensive experiments conducted on the IEMOCAP and MELD datasets have demonstrated the effectiveness of our proposed LSGMER model.

Paper Structure

This paper contains 18 sections, 19 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A sketched comparison between the previous main-stream method (a) and the proposed LSGMER (b).
  • Figure 2: The overall architecture of the LSGMER. MHA refers to Multi-Head Attention, where $g_{1}$ and $g_{2}$ represent the learning weights for the audio and text modalities, respectively.
  • Figure 3: The Attribution-Prediction Consistency Constraint.
  • Figure 4: w/o JOO & LSMA
  • Figure 5: LSGMER
  • ...and 2 more figures