Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis
Hao Yang, Zhenyu Zhang, Yanyan Zhao, Bing Qin
TL;DR
This work addresses data uncertainty in multimodal aspect-based sentiment analysis by introducing UA-MABSA, a framework that weighs each sample’s loss by a composite quality score reflecting image quality and cross-modal relevance. The model combines an image-quality assessment, coarse and fine CLIP-based correlation metrics, and a Caption Transformer-backed backbone to produce uncertainty-aware supervision, L' = avg(W_i^{Image} + W_i^{IT} + W_i^{AI}) * L. Experiments on Twitter-2015 and Twitter-2017 demonstrate competitive results and SOTA Macro-F1 on at least one dataset, with extensive ablations confirming the usefulness of each quality component and illustrating how data uncertainty can mitigate overfitting to noisy samples. The approach advances robustness and provides a practical pathway for handling variable data quality in real-world multimodal sentiment tasks. The work also discusses limitations and future directions, including extending to aspect extraction and joint MABSA tasks, and moving toward adaptive, threshold-free quality assessment.
Abstract
As a fine-grained task, multimodal aspect-based sentiment analysis (MABSA) mainly focuses on identifying aspect-level sentiment information in the text-image pair. However, we observe that it is difficult to recognize the sentiment of aspects in low-quality samples, such as those with low-resolution images that tend to contain noise. And in the real world, the quality of data usually varies for different samples, such noise is called data uncertainty. But previous works for the MABSA task treat different quality samples with the same importance and ignored the influence of data uncertainty. In this paper, we propose a novel data uncertainty-aware multimodal aspect-based sentiment analysis approach, UA-MABSA, which weighted the loss of different samples by the data quality and difficulty. UA-MABSA adopts a novel quality assessment strategy that takes into account both the image quality and the aspect-based cross-modal relevance, thus enabling the model to pay more attention to high-quality and challenging samples. Extensive experiments show that our method achieves state-of-the-art (SOTA) performance on the Twitter-2015 dataset. Further analysis demonstrates the effectiveness of the quality assessment strategy.
