Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Tianlin Zhang, En Yu, Yi Shao, Jiande Sun
TL;DR
This work tackles multimodal fake news detection by explicitly extracting intrinsic discriminative information within each modality and highlighting cross-modal inconsistencies without external knowledge. It introduces MIAN, integrating a Hierarchical Learning Module to refine unimodal representations, a Cross-modal Interaction Module with co-attention to fuse modalities, and an Inverse Attention mechanism to reveal explicit inconsistencies. Empirical results across four real-world datasets (Weibo17, Weibo21, GossipCop, PolitiFact) show MIAN achieving superior accuracy and F1 scores, with ablations confirming the value of each component. The method demonstrates strong practical potential for safeguarding information integrity, while noting future work to extend to multiple images per article to mitigate modality bias.
Abstract
Multimodal fake news detection has garnered significant attention due to its profound implications for social security. While existing approaches have contributed to understanding cross-modal consistency, they often fail to leverage modal-specific representations and explicit discrepant features. To address these limitations, we propose a Multimodal Inverse Attention Network (MIAN), a novel framework that explores intrinsic discriminative features based on news content to advance fake news detection. Specifically, MIAN introduces a hierarchical learning module that captures diverse intra-modal relationships through local-to-global and local-to-local interactions, thereby generating enhanced unimodal representations to improve the identification of fake news at the intra-modal level. Additionally, a cross-modal interaction module employs a co-attention mechanism to establish and model dependencies between the refined unimodal representations, facilitating seamless semantic integration across modalities. To explicitly extract inconsistency features, we propose an inverse attention mechanism that effectively highlights the conflicting patterns and semantic deviations introduced by fake news in both intra- and inter-modality. Extensive experiments on benchmark datasets demonstrate that MIAN significantly outperforms state-of-the-art methods, underscoring its pivotal contribution to advancing social security through enhanced multimodal fake news detection.
