Table of Contents
Fetching ...

Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis

Hao Yang, Zhenyu Zhang, Yanyan Zhao, Bing Qin

TL;DR

This work addresses data uncertainty in multimodal aspect-based sentiment analysis by introducing UA-MABSA, a framework that weighs each sample’s loss by a composite quality score reflecting image quality and cross-modal relevance. The model combines an image-quality assessment, coarse and fine CLIP-based correlation metrics, and a Caption Transformer-backed backbone to produce uncertainty-aware supervision, L' = avg(W_i^{Image} + W_i^{IT} + W_i^{AI}) * L. Experiments on Twitter-2015 and Twitter-2017 demonstrate competitive results and SOTA Macro-F1 on at least one dataset, with extensive ablations confirming the usefulness of each quality component and illustrating how data uncertainty can mitigate overfitting to noisy samples. The approach advances robustness and provides a practical pathway for handling variable data quality in real-world multimodal sentiment tasks. The work also discusses limitations and future directions, including extending to aspect extraction and joint MABSA tasks, and moving toward adaptive, threshold-free quality assessment.

Abstract

As a fine-grained task, multimodal aspect-based sentiment analysis (MABSA) mainly focuses on identifying aspect-level sentiment information in the text-image pair. However, we observe that it is difficult to recognize the sentiment of aspects in low-quality samples, such as those with low-resolution images that tend to contain noise. And in the real world, the quality of data usually varies for different samples, such noise is called data uncertainty. But previous works for the MABSA task treat different quality samples with the same importance and ignored the influence of data uncertainty. In this paper, we propose a novel data uncertainty-aware multimodal aspect-based sentiment analysis approach, UA-MABSA, which weighted the loss of different samples by the data quality and difficulty. UA-MABSA adopts a novel quality assessment strategy that takes into account both the image quality and the aspect-based cross-modal relevance, thus enabling the model to pay more attention to high-quality and challenging samples. Extensive experiments show that our method achieves state-of-the-art (SOTA) performance on the Twitter-2015 dataset. Further analysis demonstrates the effectiveness of the quality assessment strategy.

Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis

TL;DR

This work addresses data uncertainty in multimodal aspect-based sentiment analysis by introducing UA-MABSA, a framework that weighs each sample’s loss by a composite quality score reflecting image quality and cross-modal relevance. The model combines an image-quality assessment, coarse and fine CLIP-based correlation metrics, and a Caption Transformer-backed backbone to produce uncertainty-aware supervision, L' = avg(W_i^{Image} + W_i^{IT} + W_i^{AI}) * L. Experiments on Twitter-2015 and Twitter-2017 demonstrate competitive results and SOTA Macro-F1 on at least one dataset, with extensive ablations confirming the usefulness of each quality component and illustrating how data uncertainty can mitigate overfitting to noisy samples. The approach advances robustness and provides a practical pathway for handling variable data quality in real-world multimodal sentiment tasks. The work also discusses limitations and future directions, including extending to aspect extraction and joint MABSA tasks, and moving toward adaptive, threshold-free quality assessment.

Abstract

As a fine-grained task, multimodal aspect-based sentiment analysis (MABSA) mainly focuses on identifying aspect-level sentiment information in the text-image pair. However, we observe that it is difficult to recognize the sentiment of aspects in low-quality samples, such as those with low-resolution images that tend to contain noise. And in the real world, the quality of data usually varies for different samples, such noise is called data uncertainty. But previous works for the MABSA task treat different quality samples with the same importance and ignored the influence of data uncertainty. In this paper, we propose a novel data uncertainty-aware multimodal aspect-based sentiment analysis approach, UA-MABSA, which weighted the loss of different samples by the data quality and difficulty. UA-MABSA adopts a novel quality assessment strategy that takes into account both the image quality and the aspect-based cross-modal relevance, thus enabling the model to pay more attention to high-quality and challenging samples. Extensive experiments show that our method achieves state-of-the-art (SOTA) performance on the Twitter-2015 dataset. Further analysis demonstrates the effectiveness of the quality assessment strategy.

Paper Structure

This paper contains 17 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Examples of MABSA data with different quality and difficulty. High-quality and low-quality samples are distinguished by image recognizability and image-text correlation. Emotional recognition of hard samples requires cross-modal interaction information, while simple samples rely more on text content.
  • Figure 2: Previous MABSA methods structure vs UA-MABSA method structure: (a) Previous MABSA methods structure with uni-modal encoders and cross-modal encoder. (b) UA-MABSA method structure, the weight of loss is adjusted based on the image quality, image-text relevance and aspect-image relevance.
  • Figure 3: The overview of data uncertainty-aware multimodal aspect-based sentiment analysis(UA-MABSA) model architecture.
  • Figure 4: The performance of UA-MABSA with different thresholds of OCR message length and OpenCV image score for multimodal aspect-based sentiment analysis.
  • Figure 5: Comparison of quality weights between the Twitter 2015 dataset and the Twitter 2017 dataset.
  • ...and 1 more figures