REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

Chenwei Wu; Zitao Shuai; Liyue Shen

REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

Chenwei Wu, Zitao Shuai, Liyue Shen

TL;DR

REMIND is a novel group-specialized Mixture-of-Experts architecture that scalably learns group-specific multi-modal fusion functions for arbitrary modality combinations, while simultaneously leveraging a group distributionally robust optimization strategy to upweight underrepresented modality combinations.

Abstract

Medical multi-modal learning is critical for integrating information from a large set of diverse modalities. However, when leveraging a high number of modalities in real clinical applications, it is often impractical to obtain full-modality observations for every patient due to data collection constraints, a problem we refer to as 'High-Modality Learning under Missingness'. In this study, we identify that such missingness inherently induces an exponential growth in possible modality combinations, followed by long-tail distributions of modality combinations due to varying modality availability. While prior work overlooked this critical phenomenon, we find this long-tailed distribution leads to significant underperformance on tail modality combination groups. Our empirical analysis attributes this problem to two fundamental issues: 1) gradient inconsistency, where tail groups' gradient updates diverge from the overall optimization direction; 2) concept shifts, where each modality combination requires distinct fusion functions. To address these challenges, we propose REMIND, a unified framework that REthinks MultImodal learNing under high-moDality missingness from a long-tail perspective. Our core idea is to propose a novel group-specialized Mixture-of-Experts architecture that scalably learns group-specific multi-modal fusion functions for arbitrary modality combinations, while simultaneously leveraging a group distributionally robust optimization strategy to upweight underrepresented modality combinations. Extensive experiments on real-world medical datasets show that our framework consistently outperforms state-of-the-art methods, and robustly generalizes across various medical multi-modal learning applications under high-modality missingness.

REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

TL;DR

Abstract

Paper Structure (55 sections, 24 equations, 5 figures, 17 tables)

This paper contains 55 sections, 24 equations, 5 figures, 17 tables.

Introduction
Related Work
High-Modality Learning with Missing Data
Long-Tailed Modeling
Method
Problem Formulation and Modeling
Overall Distributionally Robust Framework
Tackling Concept Shift with Soft MoE
Experiments
Results
Main Results
Analysis
Conclusion
Appendix
Intuition Behind Gradient Consistency Analysis
...and 40 more sections

Figures (5)

Figure 1: Missing modalities in high-modality multi-modal learning -- A long-tailed distribution view.
Figure 2: Overview of REMIND. We model high-modality learning under missing modalities.
Figure 3: Gradient inconsistencies across training steps.
Figure 4: Visualization of top experts patterns across modality combination groups on FPRM Dataset.
Figure 5: Performance of held-out tail groups in the FPRM dataset. We show performances when unfreezing and finetuning different model parts. X-Axis indicates the parts of model being finetuned: from zero-shot ('Nothing') to full finetuning ('Pred Head +Router + 100% experts').

REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

TL;DR

Abstract

REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (5)