Senti-iFusion: An Integrity-centered Hierarchical Fusion Framework for Multimodal Sentiment Analysis under Uncertain Modality Missingness
Liling Li, Guoyang Xu, Xiongri Shen, Zhifei Xu, Yanbo Zhang, Zhiguo Zhang, Zhenxi Song
TL;DR
Senti-iFusion tackles multimodal sentiment analysis under unknown inter- and intra-modality missingness by introducing an integrity-centered hierarchical fusion framework. It combines Integrity Estimation, Integrity-weighted Cross-modal Completion, and Integrity-guided Adaptive Fusion to recover and leverage sentiment cues from partially observed data. A dual-depth validation with semantic and feature-level losses, plus a progressive two-stage training regime, yields state-of-the-art performance on MOSI and MOSEI under challenging missing-data patterns. The approach enhances robustness and fine-grained sentiment understanding, with practical implications for real-world multimodal systems facing sensor unreliability and data loss.
Abstract
Multimodal Sentiment Analysis (MSA) is critical for human-computer interaction but faces challenges when the modalities are incomplete or missing. Existing methods often assume pre-defined missing modalities or fixed missing rates, limiting their real-world applicability. To address this challenge, we propose Senti-iFusion, an integrity-centered hierarchical fusion framework capable of handling both inter- and intra-modality missingness simultaneously. It comprises three hierarchical components: Integrity Estimation, Integrity-weighted Completion, and Integrity-guided Fusion. First, the Integrity Estimation module predicts the completeness of each modality and mitigates the noise caused by incomplete data. Second, the Integrity-weighted Cross-modal Completion module employs a novel weighting mechanism to disentangle consistent semantic structures from modality-specific representations, enabling the precise recovery of sentiment-related features across language, acoustic, and visual modalities. To ensure consistency in reconstruction, a dual-depth validation with semantic- and feature-level losses ensures consistent reconstruction at both fine-grained (low-level) and semantic (high-level) scales. Finally, the Integrity-guided Adaptive Fusion mechanism dynamically selects the dominant modality for attention-based fusion, ensuring that the most reliable modality, based on completeness and quality, contributes more significantly to the final prediction. Senti-iFusion employs a progressive training approach to ensure stable convergence. Experimental results on popular MSA datasets demonstrate that Senti-iFusion outperforms existing methods, particularly in fine-grained sentiment analysis tasks. The code and our proposed Senti-iFusion model will be publicly available.
