Table of Contents
Fetching ...

A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

Nguyen Van Doan, Dat Tran Nguyen, Cam-Van Thi Nguyen

TL;DR

DualDe tackles noisy multimodal signals in MABSA by coupling Hybrid Curriculum Denoising (HCD) with Aspect-Enhance Denoising (AED). HCD guides training with a composite difficulty $d^c_i = α d^l_i + (1 - α) d^s_i$ (α = 0.8) and a competence curve $p(t)$ to progressively expose the model to harder samples; AED employs an aspect-guided attention and a GCN over a weighted association matrix to filter non-essential visual regions and text tokens. On Twitter2015 and Twitter2017, DualDe achieves state-of-the-art results across JMASA, MASC, and MATE, with gains in precision, recall, and F1. The work demonstrates robust cross-modal denoising and improved cross-modal alignment with implications for real-world noisy multimodal sentiment analysis.

Abstract

Multimodal Aspect-Based Sentiment Analysis (MABSA) combines text and images to perform sentiment analysis but often struggles with irrelevant or misleading visual information. Existing methodologies typically address either sentence-image denoising or aspect-image denoising but fail to comprehensively tackle both types of noise. To address these limitations, we propose DualDe, a novel approach comprising two distinct components: the Hybrid Curriculum Denoising Module (HCD) and the Aspect-Enhance Denoising Module (AED). The HCD module enhances sentence-image denoising by incorporating a flexible curriculum learning strategy that prioritizes training on clean data. Concurrently, the AED module mitigates aspect-image noise through an aspect-guided attention mechanism that filters out noisy visual regions which unrelated to the specific aspects of interest. Our approach demonstrates effectiveness in addressing both sentence-image and aspect-image noise, as evidenced by experimental evaluations on benchmark datasets.

A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

TL;DR

DualDe tackles noisy multimodal signals in MABSA by coupling Hybrid Curriculum Denoising (HCD) with Aspect-Enhance Denoising (AED). HCD guides training with a composite difficulty (α = 0.8) and a competence curve to progressively expose the model to harder samples; AED employs an aspect-guided attention and a GCN over a weighted association matrix to filter non-essential visual regions and text tokens. On Twitter2015 and Twitter2017, DualDe achieves state-of-the-art results across JMASA, MASC, and MATE, with gains in precision, recall, and F1. The work demonstrates robust cross-modal denoising and improved cross-modal alignment with implications for real-world noisy multimodal sentiment analysis.

Abstract

Multimodal Aspect-Based Sentiment Analysis (MABSA) combines text and images to perform sentiment analysis but often struggles with irrelevant or misleading visual information. Existing methodologies typically address either sentence-image denoising or aspect-image denoising but fail to comprehensively tackle both types of noise. To address these limitations, we propose DualDe, a novel approach comprising two distinct components: the Hybrid Curriculum Denoising Module (HCD) and the Aspect-Enhance Denoising Module (AED). The HCD module enhances sentence-image denoising by incorporating a flexible curriculum learning strategy that prioritizes training on clean data. Concurrently, the AED module mitigates aspect-image noise through an aspect-guided attention mechanism that filters out noisy visual regions which unrelated to the specific aspects of interest. Our approach demonstrates effectiveness in addressing both sentence-image and aspect-image noise, as evidenced by experimental evaluations on benchmark datasets.

Paper Structure

This paper contains 24 sections, 13 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of Sentence-Image Denoising and Aspect-Image Denoising. Sentence-Image Denoising classifies an image as clean if it is relevant to the overall sentence meaning. Aspect-Image Denoising identifies regions as noise (e.g., blurred areas) when they lack strong relevance to any specific aspect.
  • Figure 2: Model Overview
  • Figure 3: Illustrate the curve of the competence function $p(t)$ and the corresponding amount of selected data at epoch $t$.
  • Figure 4: Illustration Contribution Ratio Coefficient Test
  • Figure 5: The figure illustrates instances where sentence-image noise and aspect-image noise impact the effectiveness of sentiment analysis. The easy sample features a clear alignment between the sentence and image, enhancing sentiment detection, while the hard sample involves a blurry image with minimal relevance to the sentence's aspects, complicating accurate sentiment evaluation.