Table of Contents
Fetching ...

SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal Misinformation Veracity Prediction

Swarang Joshi, Siddharth Mavani, Joel Alex, Arnav Negi, Rahul Mishra, Ponnurangam Kumaraguru

TL;DR

SceneGraMMi, a Scene Graph-boosted Hybrid-fusion approach for Multi-modal Misinformation veracity prediction, which integrates scene graphs across different modalities to improve detection performance is proposed.

Abstract

Misinformation undermines individual knowledge and affects broader societal narratives. Despite growing interest in the research community in multi-modal misinformation detection, existing methods exhibit limitations in capturing semantic cues, key regions, and cross-modal similarities within multi-modal datasets. We propose SceneGraMMi, a Scene Graph-boosted Hybrid-fusion approach for Multi-modal Misinformation veracity prediction, which integrates scene graphs across different modalities to improve detection performance. Experimental results across four benchmark datasets show that SceneGraMMi consistently outperforms state-of-the-art methods. In a comprehensive ablation study, we highlight the contribution of each component, while Shapley values are employed to examine the explainability of the model's decision-making process.

SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal Misinformation Veracity Prediction

TL;DR

SceneGraMMi, a Scene Graph-boosted Hybrid-fusion approach for Multi-modal Misinformation veracity prediction, which integrates scene graphs across different modalities to improve detection performance is proposed.

Abstract

Misinformation undermines individual knowledge and affects broader societal narratives. Despite growing interest in the research community in multi-modal misinformation detection, existing methods exhibit limitations in capturing semantic cues, key regions, and cross-modal similarities within multi-modal datasets. We propose SceneGraMMi, a Scene Graph-boosted Hybrid-fusion approach for Multi-modal Misinformation veracity prediction, which integrates scene graphs across different modalities to improve detection performance. Experimental results across four benchmark datasets show that SceneGraMMi consistently outperforms state-of-the-art methods. In a comprehensive ablation study, we highlight the contribution of each component, while Shapley values are employed to examine the explainability of the model's decision-making process.

Paper Structure

This paper contains 20 sections, 3 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Architecture diagram of our multi-modal fake news detection model. Image-text pairs are passed as input to each module, and the final classification is done through a Feed Forward Network (FFN) to predict the final label: Fake or Real. The components of each module, namely the 1.) Transformer-based Encoder Module (TEM) and the 2.) GNN-based Scene Graph Module (GSGM), are enclosed in dotted boxes.
  • Figure 2: The diagram depicts the scene graphs of 2 samples in the Twitter dataset. Each sample has an input text and an input image along with the associated Scene Graphs. Object nodes are blue in color, attribute nodes are green in color and relationship nodes are red in color.
  • Figure 3: Explainability of the model on example (a) and (b) which are selected from Politifact dataset. Each column consists of the image-text pair that is passed as input to the model. The highlighted portion of each modality is what the model uses to classify the sample as Fake or Real. Shapley values are utilised for highlighting model focus.