Table of Contents
Fetching ...

A Heterogeneous Multimodal Graph Learning Framework for Recognizing User Emotions in Social Networks

Sree Bhattacharyya, Shuhua Yang, James Z. Wang

TL;DR

This work addresses personalized emotion prediction in social networks by formulating the problem as edge classification on a heterogeneous user–image graph and introducing HMG-Emo, a Graph Attention Network–based framework with a dynamic context fusion module for adaptive multimodal integration. The model leverages visual features from ResNet-50, textual cues via multilingual BERT, and social context through user groups and connections, employing an adaptive combination mechanism to fuse modalities. Experiments on the IESN dataset show substantial gains over baselines, with HMG-Emo delivering high precision and robust performance through multimodal fusion, even when some modalities are partially missing. The study demonstrates the potential of deep graph learning for affective computing in social networks and points to future work involving geo-context, temporal data, and Graph-LLMs for enhanced emotion prediction and interpretability.

Abstract

The rapid expansion of social media platforms has provided unprecedented access to massive amounts of multimodal user-generated content. Comprehending user emotions can provide valuable insights for improving communication and understanding of human behaviors. Despite significant advancements in Affective Computing, the diverse factors influencing user emotions in social networks remain relatively understudied. Moreover, there is a notable lack of deep learning-based methods for predicting user emotions in social networks, which could be addressed by leveraging the extensive multimodal data available. This work presents a novel formulation of personalized emotion prediction in social networks based on heterogeneous graph learning. Building upon this formulation, we design HMG-Emo, a Heterogeneous Multimodal Graph Learning Framework that utilizes deep learning-based features for user emotion recognition. Additionally, we include a dynamic context fusion module in HMG-Emo that is capable of adaptively integrating the different modalities in social media data. Through extensive experiments, we demonstrate the effectiveness of HMG-Emo and verify the superiority of adopting a graph neural network-based approach, which outperforms existing baselines that use rich hand-crafted features. To the best of our knowledge, HMG-Emo is the first multimodal and deep-learning-based approach to predict personalized emotions within online social networks. Our work highlights the significance of exploiting advanced deep learning techniques for less-explored problems in Affective Computing.

A Heterogeneous Multimodal Graph Learning Framework for Recognizing User Emotions in Social Networks

TL;DR

This work addresses personalized emotion prediction in social networks by formulating the problem as edge classification on a heterogeneous user–image graph and introducing HMG-Emo, a Graph Attention Network–based framework with a dynamic context fusion module for adaptive multimodal integration. The model leverages visual features from ResNet-50, textual cues via multilingual BERT, and social context through user groups and connections, employing an adaptive combination mechanism to fuse modalities. Experiments on the IESN dataset show substantial gains over baselines, with HMG-Emo delivering high precision and robust performance through multimodal fusion, even when some modalities are partially missing. The study demonstrates the potential of deep graph learning for affective computing in social networks and points to future work involving geo-context, temporal data, and Graph-LLMs for enhanced emotion prediction and interpretability.

Abstract

The rapid expansion of social media platforms has provided unprecedented access to massive amounts of multimodal user-generated content. Comprehending user emotions can provide valuable insights for improving communication and understanding of human behaviors. Despite significant advancements in Affective Computing, the diverse factors influencing user emotions in social networks remain relatively understudied. Moreover, there is a notable lack of deep learning-based methods for predicting user emotions in social networks, which could be addressed by leveraging the extensive multimodal data available. This work presents a novel formulation of personalized emotion prediction in social networks based on heterogeneous graph learning. Building upon this formulation, we design HMG-Emo, a Heterogeneous Multimodal Graph Learning Framework that utilizes deep learning-based features for user emotion recognition. Additionally, we include a dynamic context fusion module in HMG-Emo that is capable of adaptively integrating the different modalities in social media data. Through extensive experiments, we demonstrate the effectiveness of HMG-Emo and verify the superiority of adopting a graph neural network-based approach, which outperforms existing baselines that use rich hand-crafted features. To the best of our knowledge, HMG-Emo is the first multimodal and deep-learning-based approach to predict personalized emotions within online social networks. Our work highlights the significance of exploiting advanced deep learning techniques for less-explored problems in Affective Computing.
Paper Structure (20 sections, 5 equations, 4 figures, 5 tables)

This paper contains 20 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Example from the IESN zhao2016predicting dataset. The emotion labels are dependent on both the image contents and the user comments, showing the complex nature of emotion interpretation.
  • Figure 2: An overview of the complete methodology pipeline.
  • Figure 3: The structure of the created heterogeneous graph. The feature generation process for a particular user is depicted in one of the user nodes.
  • Figure 4: The prompting method used for LLaVA