Table of Contents
Fetching ...

A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks

Israa Khalaf Salman Al-Tameemi, Mohammad-Reza Feizi-Derakhshi, Saeed Pashazadeh, Mohammad Asadpour

TL;DR

This survey addresses the shift from text-only sentiment analysis to multimodal sentiment analysis on social media by focusing on the fusion of visual and textual data. It systematically reviews preprocessing, feature extraction, fusion strategies (rule-based, classification-based, attention-based, and bilinear pooling), and classifier approaches across textual, visual, and joint modalities, with attention to benchmark datasets and evaluation measures. The paper also discusses the main challenges of multimodal SA, including cross-modal heterogeneity, incomplete modalities, and data scarcity, and highlights a broad range of applications from finance to healthcare. Overall, it underscores that multimodal SA can surpass unimodal approaches by leveraging complementary visual and textual cues, while outlining practical directions for future research and cross-disciplinary collaboration.

Abstract

Social media networks have become a significant aspect of people's lives, serving as a platform for their ideas, opinions and emotions. Consequently, automated sentiment analysis (SA) is critical for recognising people's feelings in ways that other information sources cannot. The analysis of these feelings revealed various applications, including brand evaluations, YouTube film reviews and healthcare applications. As social media continues to develop, people post a massive amount of information in different forms, including text, photos, audio and video. Thus, traditional SA algorithms have become limited, as they do not consider the expressiveness of other modalities. By including such characteristics from various material sources, these multimodal data streams provide new opportunities for optimising the expected results beyond text-based SA. Our study focuses on the forefront field of multimodal SA, which examines visual and textual data posted on social media networks. Many people are more likely to utilise this information to express themselves on these platforms. To serve as a resource for academics in this rapidly growing field, we introduce a comprehensive overview of textual and visual SA, including data pre-processing, feature extraction techniques, sentiment benchmark datasets, and the efficacy of multiple classification methodologies suited to each field. We also provide a brief introduction of the most frequently utilised data fusion strategies and a summary of existing research on visual-textual SA. Finally, we highlight the most significant challenges and investigate several important sentiment applications.

A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks

TL;DR

This survey addresses the shift from text-only sentiment analysis to multimodal sentiment analysis on social media by focusing on the fusion of visual and textual data. It systematically reviews preprocessing, feature extraction, fusion strategies (rule-based, classification-based, attention-based, and bilinear pooling), and classifier approaches across textual, visual, and joint modalities, with attention to benchmark datasets and evaluation measures. The paper also discusses the main challenges of multimodal SA, including cross-modal heterogeneity, incomplete modalities, and data scarcity, and highlights a broad range of applications from finance to healthcare. Overall, it underscores that multimodal SA can surpass unimodal approaches by leveraging complementary visual and textual cues, while outlining practical directions for future research and cross-disciplinary collaboration.

Abstract

Social media networks have become a significant aspect of people's lives, serving as a platform for their ideas, opinions and emotions. Consequently, automated sentiment analysis (SA) is critical for recognising people's feelings in ways that other information sources cannot. The analysis of these feelings revealed various applications, including brand evaluations, YouTube film reviews and healthcare applications. As social media continues to develop, people post a massive amount of information in different forms, including text, photos, audio and video. Thus, traditional SA algorithms have become limited, as they do not consider the expressiveness of other modalities. By including such characteristics from various material sources, these multimodal data streams provide new opportunities for optimising the expected results beyond text-based SA. Our study focuses on the forefront field of multimodal SA, which examines visual and textual data posted on social media networks. Many people are more likely to utilise this information to express themselves on these platforms. To serve as a resource for academics in this rapidly growing field, we introduce a comprehensive overview of textual and visual SA, including data pre-processing, feature extraction techniques, sentiment benchmark datasets, and the efficacy of multiple classification methodologies suited to each field. We also provide a brief introduction of the most frequently utilised data fusion strategies and a summary of existing research on visual-textual SA. Finally, we highlight the most significant challenges and investigate several important sentiment applications.
Paper Structure (23 sections, 4 equations, 2 figures, 10 tables)

This paper contains 23 sections, 4 equations, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Basic architecture for visual–textual sentiment analysis.
  • Figure 2: Text sentiment classification approaches.