Table of Contents
Fetching ...

Explainable Multimodal Sentiment Analysis on Bengali Memes

Kazi Toufique Elahi, Tasnuva Binte Rahman, Shakil Shahriar, Samir Sarker, Sajib Kumar Saha Joy, Faisal Muhammad Shah

TL;DR

This work tackles sentiment analysis of Bengali memes, a low-resource language domain, by evaluating unimodal and multimodal approaches on the MemoSen dataset. It demonstrates that a multimodal pipeline combining BanglishBERT for text with ResNet50 for images achieves a weighted F1 of $0.71$, outperforming prior Bengali meme studies and unimodal baselines. The authors also apply explainable AI (LIME) to reveal how visual and textual cues contribute to predictions, highlighting challenges in detecting neutral memes due to data imbalance and template similarities. The study underscores the viability of multimodal strategies for culturally specific memes and suggests directions for improving data balance and model architectures to further boost performance and interpretability.

Abstract

Memes have become a distinctive and effective form of communication in the digital era, attracting online communities and cutting across cultural barriers. Even though memes are frequently linked with humor, they have an amazing capacity to convey a wide range of emotions, including happiness, sarcasm, frustration, and more. Understanding and interpreting the sentiment underlying memes has become crucial in the age of information. Previous research has explored text-based, image-based, and multimodal approaches, leading to the development of models like CAPSAN and PromptHate for detecting various meme categories. However, the study of low-resource languages like Bengali memes remains scarce, with limited availability of publicly accessible datasets. A recent contribution includes the introduction of the MemoSen dataset. However, the achieved accuracy is notably low, and the dataset suffers from imbalanced distribution. In this study, we employed a multimodal approach using ResNet50 and BanglishBERT and achieved a satisfactory result of 0.71 weighted F1-score, performed comparison with unimodal approaches, and interpreted behaviors of the models using explainable artificial intelligence (XAI) techniques.

Explainable Multimodal Sentiment Analysis on Bengali Memes

TL;DR

This work tackles sentiment analysis of Bengali memes, a low-resource language domain, by evaluating unimodal and multimodal approaches on the MemoSen dataset. It demonstrates that a multimodal pipeline combining BanglishBERT for text with ResNet50 for images achieves a weighted F1 of , outperforming prior Bengali meme studies and unimodal baselines. The authors also apply explainable AI (LIME) to reveal how visual and textual cues contribute to predictions, highlighting challenges in detecting neutral memes due to data imbalance and template similarities. The study underscores the viability of multimodal strategies for culturally specific memes and suggests directions for improving data balance and model architectures to further boost performance and interpretability.

Abstract

Memes have become a distinctive and effective form of communication in the digital era, attracting online communities and cutting across cultural barriers. Even though memes are frequently linked with humor, they have an amazing capacity to convey a wide range of emotions, including happiness, sarcasm, frustration, and more. Understanding and interpreting the sentiment underlying memes has become crucial in the age of information. Previous research has explored text-based, image-based, and multimodal approaches, leading to the development of models like CAPSAN and PromptHate for detecting various meme categories. However, the study of low-resource languages like Bengali memes remains scarce, with limited availability of publicly accessible datasets. A recent contribution includes the introduction of the MemoSen dataset. However, the achieved accuracy is notably low, and the dataset suffers from imbalanced distribution. In this study, we employed a multimodal approach using ResNet50 and BanglishBERT and achieved a satisfactory result of 0.71 weighted F1-score, performed comparison with unimodal approaches, and interpreted behaviors of the models using explainable artificial intelligence (XAI) techniques.
Paper Structure (11 sections, 7 figures, 3 tables)

This paper contains 11 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Example meme from each class
  • Figure 2: Length-Frequency distribution of Captions
  • Figure 3: Word Cloud
  • Figure 4: Methodology
  • Figure 5: Train and Validation Loss vs Accuracy Curve of Multimodal BanglishBERT + ResNet50
  • ...and 2 more figures