Table of Contents
Fetching ...

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang

TL;DR

This work tackles multimodal sentiment analysis under uncertain modality missingness by introducing CorrKD, a correlation-decoupled knowledge distillation framework. It comprises three targeted mechanisms: Sample-level Contrastive Distillation (SCD) to propagate cross-sample knowledge, Category-guided Prototype Distillation (CPD) to align intra- and inter-category feature variations, and Response-disentangled Consistency Distillation (RCD) to maximize mutual information between teacher and student responses while decoupling target and non-target signals. The teacher–student paradigm uses a Modality Random Missing strategy to simulate incomplete inputs during training, enabling robust joint representations when modalities are partially observed. Empirical results on MOSI, MOSEI, and IEMOCAP demonstrate that CorrKD improves performance under both intra- and inter-modality missingness, while maintaining strong results for complete-modality cases, underscoring its practical impact for real-world MSA systems with incomplete data.

Abstract

Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the MSA task under uncertain missing modalities. Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. Moreover, a category-guided prototype distillation mechanism is introduced to capture cross-category correlations using category prototypes to align feature distributions and generate favorable joint representations. Eventually, we design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network through response disentanglement and mutual information maximization. Comprehensive experiments on three datasets indicate that our framework can achieve favorable improvements compared with several baselines.

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

TL;DR

This work tackles multimodal sentiment analysis under uncertain modality missingness by introducing CorrKD, a correlation-decoupled knowledge distillation framework. It comprises three targeted mechanisms: Sample-level Contrastive Distillation (SCD) to propagate cross-sample knowledge, Category-guided Prototype Distillation (CPD) to align intra- and inter-category feature variations, and Response-disentangled Consistency Distillation (RCD) to maximize mutual information between teacher and student responses while decoupling target and non-target signals. The teacher–student paradigm uses a Modality Random Missing strategy to simulate incomplete inputs during training, enabling robust joint representations when modalities are partially observed. Empirical results on MOSI, MOSEI, and IEMOCAP demonstrate that CorrKD improves performance under both intra- and inter-modality missingness, while maintaining strong results for complete-modality cases, underscoring its practical impact for real-world MSA systems with incomplete data.

Abstract

Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the MSA task under uncertain missing modalities. Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. Moreover, a category-guided prototype distillation mechanism is introduced to capture cross-category correlations using category prototypes to align feature distributions and generate favorable joint representations. Eventually, we design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network through response disentanglement and mutual information maximization. Comprehensive experiments on three datasets indicate that our framework can achieve favorable improvements compared with several baselines.
Paper Structure (19 sections, 7 equations, 6 figures, 3 tables)

This paper contains 19 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Traditional model outputs correct prediction when inputting the sample with complete modalities, but incorrectly predicts the sample with missing modalities. We define two missing modality cases: (i) intra-modality missingness (i.e., the pink areas) and (ii) inter-modality missingness (i.e., the yellow area).
  • Figure 2: The structure of our CorrKD, which consists of three core components: Sample-level Contrastive Distillation (SCD) mechanism, Category-guided Prototype Distillation (CPD) mechanism, and Response-disentangled Consistency Distillation (RCD) strategy.
  • Figure 3: Comparison results of intra-modality missingness on IEMOCAP. We comprehensively report the F1 score for the happy, sad, angry, and neutral categories at various missing ratios.
  • Figure 4: Comparison results of intra-modality missingness on (a) MOSI and (b) MOSEI. We report the F1 score at various ratios.
  • Figure 5: Ablation results of intra-modality missingness using various missing ratios on MOSI.
  • ...and 1 more figures