Deep Multimodal Learning with Missing Modality: A Survey
Renjie Wu, Hu Wang, Hsiang-Ting Chen, Gustavo Carneiro
TL;DR
This survey addresses the challenge of missing modalities in deep multimodal learning (MLMM) by proposing a two-axis taxonomy that separates data processing (modality imputation vs representation-focused) from strategy design (architecture-focused vs model combinations). It provides a fine-grained categorization into twelve methodological families, analyzes 315 papers across diverse domains, and discusses applications, datasets, and open issues. Key contributions include a comprehensive taxonomy, an in-depth comparison of methods with recovery versus non-recovery approaches, and guidance on benchmarking, efficiency, and future directions. The work offers a structured roadmap for researchers to design robust MLMM systems and evaluate progress consistently across applications and modalities.
Abstract
During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performance. Multimodal learning techniques designed to handle missing modalities can mitigate this by ensuring model robustness even when some modalities are unavailable. This survey reviews recent progress in Multimodal Learning with Missing Modality (MLMM), focusing on deep learning methods. It provides the first comprehensive survey that covers the motivation and distinctions between MLMM and standard multimodal learning setups, followed by a detailed analysis of current methods, applications, and datasets, concluding with challenges and future directions.
