Table of Contents
Fetching ...

Progressive Representation Learning for Multimodal Sentiment Analysis with Incomplete Modalities

Jindi Bao, Jianjun Qian, Mengkai Yan, Jian Yang

TL;DR

PRLF, a Progressive Representation Learning Framework designed for MSA under uncertain missing-modality conditions, introduces an Adaptive Modality Reliability Estimator (AMRE), which dynamically quantifies the reliability of each modality using recognition confidence and Fisher information to determine the dominant modality.

Abstract

Multimodal Sentiment Analysis (MSA) seeks to infer human emotions by integrating textual, acoustic, and visual cues. However, existing approaches often rely on all modalities are completeness, whereas real-world applications frequently encounter noise, hardware failures, or privacy restrictions that result in missing modalities. There exists a significant feature misalignment between incomplete and complete modalities, and directly fusing them may even distort the well-learned representations of the intact modalities. To this end, we propose PRLF, a Progressive Representation Learning Framework designed for MSA under uncertain missing-modality conditions. PRLF introduces an Adaptive Modality Reliability Estimator (AMRE), which dynamically quantifies the reliability of each modality using recognition confidence and Fisher information to determine the dominant modality. In addition, the Progressive Interaction (ProgInteract) module iteratively aligns the other modalities with the dominant one, thereby enhancing cross-modal consistency while suppressing noise. Extensive experiments on CMU-MOSI, CMU-MOSEI, and SIMS verify that PRLF outperforms state-of-the-art methods across both inter- and intra-modality missing scenarios, demonstrating its robustness and generalization capability.

Progressive Representation Learning for Multimodal Sentiment Analysis with Incomplete Modalities

TL;DR

PRLF, a Progressive Representation Learning Framework designed for MSA under uncertain missing-modality conditions, introduces an Adaptive Modality Reliability Estimator (AMRE), which dynamically quantifies the reliability of each modality using recognition confidence and Fisher information to determine the dominant modality.

Abstract

Multimodal Sentiment Analysis (MSA) seeks to infer human emotions by integrating textual, acoustic, and visual cues. However, existing approaches often rely on all modalities are completeness, whereas real-world applications frequently encounter noise, hardware failures, or privacy restrictions that result in missing modalities. There exists a significant feature misalignment between incomplete and complete modalities, and directly fusing them may even distort the well-learned representations of the intact modalities. To this end, we propose PRLF, a Progressive Representation Learning Framework designed for MSA under uncertain missing-modality conditions. PRLF introduces an Adaptive Modality Reliability Estimator (AMRE), which dynamically quantifies the reliability of each modality using recognition confidence and Fisher information to determine the dominant modality. In addition, the Progressive Interaction (ProgInteract) module iteratively aligns the other modalities with the dominant one, thereby enhancing cross-modal consistency while suppressing noise. Extensive experiments on CMU-MOSI, CMU-MOSEI, and SIMS verify that PRLF outperforms state-of-the-art methods across both inter- and intra-modality missing scenarios, demonstrating its robustness and generalization capability.
Paper Structure (15 sections, 25 equations, 7 figures, 6 tables)

This paper contains 15 sections, 25 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Feature information loss and feature phase shift caused by missing modalities.
  • Figure 2: The structure of PRLF, which consists of two key components: Adaptive Modality Reliability Estimator (AMRE) and Progressive Interaction module (ProInteract).
  • Figure 3: Analysis of confidence and Fisher information variations caused by missing key frames in the visual modality.
  • Figure 4: Comparison results of intra-modality missingness. We report the F1 score.
  • Figure 5: Ablation for the intra-modality missingness on MOSI.
  • ...and 2 more figures