Table of Contents
Fetching ...

VGA: Vision and Graph Fused Attention Network for Rumor Detection

Lin Bai, Caiyan Jia, Ziying Song, Chaoqun Cui

TL;DR

This work introduces VGA, a Vision and Graph Fused Attention Network for multimodal rumor detection, integrating propagation structures, visual semantics, image tampering cues, and OCR-derived text within a jointly trained, attention-based framework. It proposes using SRM-filtered noise features to capture tampering, mutual enhanced co-attention to fuse graph and visual modalities, and a similarity loss to align cross-modal representations, achieving state-of-the-art performance on three real-world datasets including a newly released DRWeiboMM. A comprehensive ablation study confirms the critical roles of the similarity measurement, data augmentation, OCR augmentation, and tampering features, with case studies illustrating practical benefits. The approach advances rumor detection by effectively exploiting crowd wisdom encoded in propagation graphs and robust visual cues, enabling more reliable detection in multimodal and manipulated content scenarios.

Abstract

With the development of social media, rumors have been spread broadly on social media platforms, causing great harm to society. Beside textual information, many rumors also use manipulated images or conceal textual information within images to deceive people and avoid being detected, making multimodal rumor detection be a critical problem. The majority of multimodal rumor detection methods mainly concentrate on extracting features of source claims and their corresponding images, while ignoring the comments of rumors and their propagation structures. These comments and structures imply the wisdom of crowds and are proved to be crucial to debunk rumors. Moreover, these methods usually only extract visual features in a basic manner, seldom consider tampering or textual information in images. Therefore, in this study, we propose a novel Vision and Graph Fused Attention Network (VGA) for rumor detection to utilize propagation structures among posts so as to obtain the crowd opinions and further explore visual tampering features, as well as the textual information hidden in images. We conduct extensive experiments on three datasets, demonstrating that VGA can effectively detect multimodal rumors and outperform state-of-the-art methods significantly.

VGA: Vision and Graph Fused Attention Network for Rumor Detection

TL;DR

This work introduces VGA, a Vision and Graph Fused Attention Network for multimodal rumor detection, integrating propagation structures, visual semantics, image tampering cues, and OCR-derived text within a jointly trained, attention-based framework. It proposes using SRM-filtered noise features to capture tampering, mutual enhanced co-attention to fuse graph and visual modalities, and a similarity loss to align cross-modal representations, achieving state-of-the-art performance on three real-world datasets including a newly released DRWeiboMM. A comprehensive ablation study confirms the critical roles of the similarity measurement, data augmentation, OCR augmentation, and tampering features, with case studies illustrating practical benefits. The approach advances rumor detection by effectively exploiting crowd wisdom encoded in propagation graphs and robust visual cues, enabling more reliable detection in multimodal and manipulated content scenarios.

Abstract

With the development of social media, rumors have been spread broadly on social media platforms, causing great harm to society. Beside textual information, many rumors also use manipulated images or conceal textual information within images to deceive people and avoid being detected, making multimodal rumor detection be a critical problem. The majority of multimodal rumor detection methods mainly concentrate on extracting features of source claims and their corresponding images, while ignoring the comments of rumors and their propagation structures. These comments and structures imply the wisdom of crowds and are proved to be crucial to debunk rumors. Moreover, these methods usually only extract visual features in a basic manner, seldom consider tampering or textual information in images. Therefore, in this study, we propose a novel Vision and Graph Fused Attention Network (VGA) for rumor detection to utilize propagation structures among posts so as to obtain the crowd opinions and further explore visual tampering features, as well as the textual information hidden in images. We conduct extensive experiments on three datasets, demonstrating that VGA can effectively detect multimodal rumors and outperform state-of-the-art methods significantly.
Paper Structure (21 sections, 15 equations, 5 figures, 6 tables)

This paper contains 21 sections, 15 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Some indistinguishable multimodal rumors.
  • Figure 2: The architecture of VGA. We first perform the conversion of noise images and image text recognition. OCR texts are used to supplement the claim root texts. Then, graph data augmentation and root enhancement are carried out. Classification of rumors is accomplished by combining image semantic features, image tampering features, and graph features in a comprehensive manner. Finally, the similarity loss is used to guide the learning of model parameters.
  • Figure 3: SRM noise image conversion.
  • Figure 4: Mutual Enhanced Co-Attention.
  • Figure 5: Some rumors detected by VGA but missed by VGA w/o Noise. Where (a) (b) (c) are the original RGB images in the datasets, and (d) (e) (f) are the transformed noise images.