Adaptive Learning of Consistency and Inconsistency Information for Fake News Detection
Aohan Li, Jiaxin Chen, Xin Liao, Dengyong Zhang
TL;DR
This work addresses fake news detection by modeling both inter-modal consistency and intra-modal inconsistency. It introduces MFF-Net, which leverages SWIN-T and BERT for modality-specific semantics and CLIP for global cross-modal cues, learning inter-modal consistency $R_M$ and per-modality inconsistency $R_I\_incon$, $R_T\_incon$ through a Co-attention–based fusion and self-attention filtering. An adaptive weighting scheme using cosine similarity of global features $f_s$ and $sim=(1+f_s)/2$ balances inconsistent and consistent cues for final classification. Empirical results on Weibo, Weibo-21, and GossipCop show consistent improvements over SOTA, validating the importance of incorporating both consistency and inconsistency information in multi-modal fake news detection, with practical impact for robust social-media analysis.
Abstract
The rapid advancement of social media platforms has significantly reduced the cost of information dissemination, yet it has also led to a proliferation of fake news, posing a threat to societal trust and credibility. Most of fake news detection research focused on integrating text and image information to represent the consistency of multiple modes in news content, while paying less attention to inconsistent information. Besides, existing methods that leveraged inconsistent information often caused one mode overshadowing another, leading to ineffective use of inconsistent clue. To address these issues, we propose an adaptive multi-modal feature fusion network (MFF-Net). Inspired by human judgment processes for determining truth and falsity in news, MFF-Net focuses on inconsistent parts when news content is generally consistent and consistent parts when it is generally inconsistent. Specifically, MFF-Net extracts semantic and global features from images and texts respectively, and learns consistency information between modes through a multiple feature fusion module. To deal with the problem of modal information being easily masked, we design a single modal feature filtering strategy to capture inconsistent information from corresponding modes separately. Finally, similarity scores are calculated based on global features with adaptive adjustments made to achieve weighted fusion of consistent and inconsistent features. Extensive experimental results demonstrate that MFF-Net outperforms state-of-the-art methods across three public news datasets derived from real social medias.
