Table of Contents
Fetching ...

Adaptive Learning of Consistency and Inconsistency Information for Fake News Detection

Aohan Li, Jiaxin Chen, Xin Liao, Dengyong Zhang

TL;DR

This work addresses fake news detection by modeling both inter-modal consistency and intra-modal inconsistency. It introduces MFF-Net, which leverages SWIN-T and BERT for modality-specific semantics and CLIP for global cross-modal cues, learning inter-modal consistency $R_M$ and per-modality inconsistency $R_I\_incon$, $R_T\_incon$ through a Co-attention–based fusion and self-attention filtering. An adaptive weighting scheme using cosine similarity of global features $f_s$ and $sim=(1+f_s)/2$ balances inconsistent and consistent cues for final classification. Empirical results on Weibo, Weibo-21, and GossipCop show consistent improvements over SOTA, validating the importance of incorporating both consistency and inconsistency information in multi-modal fake news detection, with practical impact for robust social-media analysis.

Abstract

The rapid advancement of social media platforms has significantly reduced the cost of information dissemination, yet it has also led to a proliferation of fake news, posing a threat to societal trust and credibility. Most of fake news detection research focused on integrating text and image information to represent the consistency of multiple modes in news content, while paying less attention to inconsistent information. Besides, existing methods that leveraged inconsistent information often caused one mode overshadowing another, leading to ineffective use of inconsistent clue. To address these issues, we propose an adaptive multi-modal feature fusion network (MFF-Net). Inspired by human judgment processes for determining truth and falsity in news, MFF-Net focuses on inconsistent parts when news content is generally consistent and consistent parts when it is generally inconsistent. Specifically, MFF-Net extracts semantic and global features from images and texts respectively, and learns consistency information between modes through a multiple feature fusion module. To deal with the problem of modal information being easily masked, we design a single modal feature filtering strategy to capture inconsistent information from corresponding modes separately. Finally, similarity scores are calculated based on global features with adaptive adjustments made to achieve weighted fusion of consistent and inconsistent features. Extensive experimental results demonstrate that MFF-Net outperforms state-of-the-art methods across three public news datasets derived from real social medias.

Adaptive Learning of Consistency and Inconsistency Information for Fake News Detection

TL;DR

This work addresses fake news detection by modeling both inter-modal consistency and intra-modal inconsistency. It introduces MFF-Net, which leverages SWIN-T and BERT for modality-specific semantics and CLIP for global cross-modal cues, learning inter-modal consistency and per-modality inconsistency , through a Co-attention–based fusion and self-attention filtering. An adaptive weighting scheme using cosine similarity of global features and balances inconsistent and consistent cues for final classification. Empirical results on Weibo, Weibo-21, and GossipCop show consistent improvements over SOTA, validating the importance of incorporating both consistency and inconsistency information in multi-modal fake news detection, with practical impact for robust social-media analysis.

Abstract

The rapid advancement of social media platforms has significantly reduced the cost of information dissemination, yet it has also led to a proliferation of fake news, posing a threat to societal trust and credibility. Most of fake news detection research focused on integrating text and image information to represent the consistency of multiple modes in news content, while paying less attention to inconsistent information. Besides, existing methods that leveraged inconsistent information often caused one mode overshadowing another, leading to ineffective use of inconsistent clue. To address these issues, we propose an adaptive multi-modal feature fusion network (MFF-Net). Inspired by human judgment processes for determining truth and falsity in news, MFF-Net focuses on inconsistent parts when news content is generally consistent and consistent parts when it is generally inconsistent. Specifically, MFF-Net extracts semantic and global features from images and texts respectively, and learns consistency information between modes through a multiple feature fusion module. To deal with the problem of modal information being easily masked, we design a single modal feature filtering strategy to capture inconsistent information from corresponding modes separately. Finally, similarity scores are calculated based on global features with adaptive adjustments made to achieve weighted fusion of consistent and inconsistent features. Extensive experimental results demonstrate that MFF-Net outperforms state-of-the-art methods across three public news datasets derived from real social medias.
Paper Structure (11 sections, 12 equations, 3 figures, 2 tables)

This paper contains 11 sections, 12 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Differences between existing methods and ours in fake news detection. Existing methods extract features from each modality and fuse these features to distinguish the authenticity of news. In contrast, our method simultaneously considers capturing inter-modal consistency features and inconsistency features within each modality, and adaptively adjusts the weights of each feature to enhance modal information interaction for detecting fake news.
  • Figure 2: The overview of our MFF-Net. The SWIN-T, BERT, and CLIP models are employed to extract single modal features. Then, three parallel branches are utilized to extract the inconsistent and consistent information. Finally, the cosine similarity is computed to regulate the contribution degree of each feature, and the real and fake news are predicted by the classifier.
  • Figure 3: Illustration of our multiple feature fusion module. From left to right and from top to bottom, the first pair of Co-attention fuses the semantic features of text and image, the second and third pair of Co-attention are used to enhance the fused feature $r_{it}$, and the final inter-modal consistency information $R_M$ is obtained through the TFN fusion strategy.