TF-Mamba: Text-enhanced Fusion Mamba with Missing Modalities for Robust Multimodal Sentiment Analysis
Xiang Li, Xianfu Cheng, Dezhuang Miao, Xiaoming Zhang, Zhoujun Li
TL;DR
TF-Mamba addresses robust multimodal sentiment analysis under missing modalities by integrating text-dominant strategies into an efficient Mamba framework. It introduces three components—Text-aware Modality Enhancement (TME), Text-based Context Mamba (TC-Mamba), and Text-guided Query Mamba (TQ-Mamba)—to align/enhance non-text modalities, model intra-modal context, and perform text-guided cross-modal fusion. Empirical results on MOSI, MOSEI, and SIMS show TF-Mamba achieving superior robustness and efficiency compared with Transformer-based baselines while reducing FLOPs and parameters. The work demonstrates the practicality of linear-time, text-led fusion for robust MSA, with a public implementation and clear avenues for future real-world missing-pattern handling and end-to-end optimization.
Abstract
Multimodal Sentiment Analysis (MSA) with missing modalities has attracted increasing attention recently. While current Transformer-based methods leverage dense text information to maintain model robustness, their quadratic complexity hinders efficient long-range modeling and multimodal fusion. To this end, we propose a novel and efficient Text-enhanced Fusion Mamba (TF-Mamba) framework for robust MSA with missing modalities. Specifically, a Text-aware Modality Enhancement (TME) module aligns and enriches non-text modalities, while reconstructing the missing text semantics. Moreover, we develop Text-based Context Mamba (TC-Mamba) to capture intra-modal contextual dependencies under text collaboration. Finally, Text-guided Query Mamba (TQ-Mamba) queries text-guided multimodal information and learns joint representations for sentiment prediction. Extensive experiments on three MSA datasets demonstrate the effectiveness and efficiency of the proposed method under missing modality scenarios. Our code is available at https://github.com/codemous/TF-Mamba.
