Table of Contents
Fetching ...

MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates

Binyu Zhao, Wei Zhang, Zhaonian Zou

TL;DR

This work tackles the challenge of imbalanced missing rates in multi-modal learning, where underrepresented modalities are both under-sampled and under-learned. It introduces Modality Capability Enhancement (MCE), a dual-component framework combining Learning Capability Enhancement (LCE) to balance learning dynamics with dataset-level and batch-level incentives, and Representational Capability Enhancement (RCE) to improve feature semantics via subset prediction and cross-modal completion. A Shapley-value–based mechanism drives adaptive incentives, complemented by a Transformer-based cross-modal reconstruction module, yielding robust representations across arbitrary modality subsets. Across four benchmarks, MCE consistently outperforms state-of-the-art baselines and demonstrates strong resilience to severe missingness, offering a principled, generalizable approach to real-world incomplete multi-modal data.

Abstract

Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations. The final published version is now available at https://doi.org/10.1016/j.patcog.2025.112591. Our code is available at https://github.com/byzhaoAI/MCE.

MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates

TL;DR

This work tackles the challenge of imbalanced missing rates in multi-modal learning, where underrepresented modalities are both under-sampled and under-learned. It introduces Modality Capability Enhancement (MCE), a dual-component framework combining Learning Capability Enhancement (LCE) to balance learning dynamics with dataset-level and batch-level incentives, and Representational Capability Enhancement (RCE) to improve feature semantics via subset prediction and cross-modal completion. A Shapley-value–based mechanism drives adaptive incentives, complemented by a Transformer-based cross-modal reconstruction module, yielding robust representations across arbitrary modality subsets. Across four benchmarks, MCE consistently outperforms state-of-the-art baselines and demonstrates strong resilience to severe missingness, offering a principled, generalizable approach to real-world incomplete multi-modal data.

Abstract

Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations. The final published version is now available at https://doi.org/10.1016/j.patcog.2025.112591. Our code is available at https://github.com/byzhaoAI/MCE.

Paper Structure

This paper contains 22 sections, 15 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Workflow of a multi-modal model and overview of the proposed MCE framework. During training on data with imbalanced missing rates, MCE incorporates Learning Capability Enhancement (LCE, in green) to balance modality-specific learning progress and encourage each modality to reach its performance potential, and Representation Capability Enhancement (RCE, in blue) to enrich representation semantics by exposing the model to a wider variety of multi-modal combinations.
  • Figure 2: A 4-modality, 10-sample dataset example to explain the rationale for introducing dataset-level modal rating. When the ratings of modalities are identical, update (backward) times of different modality networks in the multi-modal model may exhibit significant disparities (10, 8, 5, 2 in the example).
  • Figure 3: A example of batch size 1 with 4-modality sample (missing the 2nd modality) to generate factor $\mathcal{B}$ for learning state encouraging.
  • Figure 4: Performance across different hyperparameter configurations on three datasets.
  • Figure 5: Hyperparameter interaction analysis on IEMOCAP.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1: Shapley Value
  • Definition 2: Capability gap