External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection
Biwei Cao, Qihang Wu, Jiuxin Cao, Bo Liu, Jie Gui
TL;DR
The paper addresses multimodal fake news detection under two core challenges: effectively leveraging multimodal cues and maintaining dynamic, reliable external information. It introduces ERIC-FND, a three-module framework consisting of External Information Enhancement, Multimodal Information Interactive Enhancement (including multimodal contrastive learning and cross-modal semantic interaction), and Adaptive Fusion-based Classification. The approach enriches text with entity-derived knowledge from Wikipedia and aligns image-text representations in a shared space, followed by adaptive fusion to produce robust predictions. Empirical results on Weibo and X datasets show state-of-the-art performance, validating the benefits of external knowledge integration and cross-modal learning for reliable fake news detection.
Abstract
With the rapid development of the Internet, the information dissemination paradigm has changed and the efficiency has been improved greatly. While this also brings the quick spread of fake news and leads to negative impacts on cyberspace. Currently, the information presentation formats have evolved gradually, with the news formats shifting from texts to multimodal contents. As a result, detecting multimodal fake news has become one of the research hotspots. However, multimodal fake news detection research field still faces two main challenges: the inability to fully and effectively utilize multimodal information for detection, and the low credibility or static nature of the introduced external information, which limits dynamic updates. To bridge the gaps, we propose ERIC-FND, an external reliable information-enhanced multimodal contrastive learning framework for fake news detection. ERIC-FND strengthens the representation of news contents by entity-enriched external information enhancement method. It also enriches the multimodal news information via multimodal semantic interaction method where the multimodal constrative learning is employed to make different modality representations learn from each other. Moreover, an adaptive fusion method is taken to integrate the news representations from different dimensions for the eventual classification. Experiments are done on two commonly used datasets in different languages, X (Twitter) and Weibo. Experiment results demonstrate that our proposed model ERIC-FND outperforms existing state-of-the-art fake news detection methods under the same settings.
