Table of Contents
Fetching ...

Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection

Tianlin Zhang, En Yu, Yi Shao, Jiande Sun

TL;DR

This work tackles multimodal fake news detection by explicitly extracting intrinsic discriminative information within each modality and highlighting cross-modal inconsistencies without external knowledge. It introduces MIAN, integrating a Hierarchical Learning Module to refine unimodal representations, a Cross-modal Interaction Module with co-attention to fuse modalities, and an Inverse Attention mechanism to reveal explicit inconsistencies. Empirical results across four real-world datasets (Weibo17, Weibo21, GossipCop, PolitiFact) show MIAN achieving superior accuracy and F1 scores, with ablations confirming the value of each component. The method demonstrates strong practical potential for safeguarding information integrity, while noting future work to extend to multiple images per article to mitigate modality bias.

Abstract

Multimodal fake news detection has garnered significant attention due to its profound implications for social security. While existing approaches have contributed to understanding cross-modal consistency, they often fail to leverage modal-specific representations and explicit discrepant features. To address these limitations, we propose a Multimodal Inverse Attention Network (MIAN), a novel framework that explores intrinsic discriminative features based on news content to advance fake news detection. Specifically, MIAN introduces a hierarchical learning module that captures diverse intra-modal relationships through local-to-global and local-to-local interactions, thereby generating enhanced unimodal representations to improve the identification of fake news at the intra-modal level. Additionally, a cross-modal interaction module employs a co-attention mechanism to establish and model dependencies between the refined unimodal representations, facilitating seamless semantic integration across modalities. To explicitly extract inconsistency features, we propose an inverse attention mechanism that effectively highlights the conflicting patterns and semantic deviations introduced by fake news in both intra- and inter-modality. Extensive experiments on benchmark datasets demonstrate that MIAN significantly outperforms state-of-the-art methods, underscoring its pivotal contribution to advancing social security through enhanced multimodal fake news detection.

Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection

TL;DR

This work tackles multimodal fake news detection by explicitly extracting intrinsic discriminative information within each modality and highlighting cross-modal inconsistencies without external knowledge. It introduces MIAN, integrating a Hierarchical Learning Module to refine unimodal representations, a Cross-modal Interaction Module with co-attention to fuse modalities, and an Inverse Attention mechanism to reveal explicit inconsistencies. Empirical results across four real-world datasets (Weibo17, Weibo21, GossipCop, PolitiFact) show MIAN achieving superior accuracy and F1 scores, with ablations confirming the value of each component. The method demonstrates strong practical potential for safeguarding information integrity, while noting future work to extend to multiple images per article to mitigate modality bias.

Abstract

Multimodal fake news detection has garnered significant attention due to its profound implications for social security. While existing approaches have contributed to understanding cross-modal consistency, they often fail to leverage modal-specific representations and explicit discrepant features. To address these limitations, we propose a Multimodal Inverse Attention Network (MIAN), a novel framework that explores intrinsic discriminative features based on news content to advance fake news detection. Specifically, MIAN introduces a hierarchical learning module that captures diverse intra-modal relationships through local-to-global and local-to-local interactions, thereby generating enhanced unimodal representations to improve the identification of fake news at the intra-modal level. Additionally, a cross-modal interaction module employs a co-attention mechanism to establish and model dependencies between the refined unimodal representations, facilitating seamless semantic integration across modalities. To explicitly extract inconsistency features, we propose an inverse attention mechanism that effectively highlights the conflicting patterns and semantic deviations introduced by fake news in both intra- and inter-modality. Extensive experiments on benchmark datasets demonstrate that MIAN significantly outperforms state-of-the-art methods, underscoring its pivotal contribution to advancing social security through enhanced multimodal fake news detection.

Paper Structure

This paper contains 18 sections, 20 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Typical types of fake news. (a)/(b) fabricated fake news where either the text or image is authentic while the other is manipulated to create fake content. Specifically, in (a), the subject in the image features a fish’s body alongside a pig’s ears and snout, highlighting categorical inconsistencies between regions; In (b), the overall semantics of the news text conflict with specific phrases when placed in the realistic context. (c) mismatched text and images from unrelated real news sources, creating inconsistent narratives.
  • Figure 2: The proposed framework, MIAN, aims to detect fake news by fully leveraging both textual and visual content in news articles. Given a news piece, the model first utilizes modality-specific encoders to extract unimodal embeddings. Next, we apply a hierarchical learning module with different attention mechanisms in the Local-to-Global and Local-to-Local Blocks to capture and enhance hierarchical feature interactions. The enhanced unimodal features are then input into a co-attention mechanism to generate multimodal fused features. Throughout the various attention mechanisms, we incorporate an inverse attention module to explicitly extract inconsistencies between different targets. Finally, all enhanced unimodal and multimodal features are fused for fake news detection.
  • Figure 3: t-SNE visualization of the mined features on the test sets of Wei17 (first row) and GossipCop (second row).
  • Figure 4: The performance of different variants of the proposed model in terms of accuracy on the True, False Connection, and Manipulation Content classes in the Fakeddit dataset.
  • Figure 5: The performance of MIAN and its variants on the Fakeddit multiclass classification task.
  • ...and 1 more figures