MoLAN: A Unified Modality-Aware Noise Dynamic Editing Framework for Multimodal Sentiment Analysis
Xingle Xu, Yongkang Liu, Dexian Cai, Shi Feng, Xiaocui Yang, Daling Wang, Yifei Zhang
TL;DR
MoLAN addresses noise in multimodal sentiment analysis by introducing a block-based, modality-aware denoising framework that partitions each modality into sub-blocks and assigns dynamic denoising strengths. It is plug-and-play across MSA models and Multimodal Large Language Models (MLLMs), with MoLAN+ adding noise-suppressed cross-attention and denoising-driven contrastive learning to sharpen cross-modal alignment. Extensive experiments on four datasets across multiple baselines show broad improvements and state-of-the-art performance, validating the approach's robustness to modality noise. The work offers a scalable, practical path toward robust, high-quality multimodal representations for sentiment analysis in real-world noisy environments.
Abstract
Multimodal Sentiment Analysis aims to integrate information from various modalities, such as audio, visual, and text, to make complementary predictions. However, it often struggles with irrelevant or misleading visual and auditory information. Most existing approaches typically treat the entire modality information (e.g., a whole image, audio segment, or text paragraph) as an independent unit for feature enhancement or denoising. They often suppress the redundant and noise information at the risk of losing critical information. To address this challenge, we propose MoLAN, a unified ModaLity-aware noise dynAmic editiNg framework. Specifically, MoLAN performs modality-aware blocking by dividing the features of each modality into multiple blocks. Each block is then dynamically assigned a distinct denoising strength based on its noise level and semantic relevance, enabling fine-grained noise suppression while preserving essential multimodal information. Notably, MoLAN is a unified and flexible framework that can be seamlessly integrated into a wide range of multimodal models. Building upon this framework, we further introduce MoLAN+, a new multimodal sentiment analysis approach. Experiments across five models and four datasets demonstrate the broad effectiveness of the MoLAN framework. Extensive evaluations show that MoLAN+ achieves the state-of-the-art performance. The code is publicly available at https://github.com/betterfly123/MoLAN-Framework.
