Table of Contents
Fetching ...

A Self-Learning Multimodal Approach for Fake News Detection

Hao Chen, Hui Guo, Baochen Hu, Shu Hu, Jinrong Hu, Siwei Lyu, Xi Wu, Xin Wang

TL;DR

This paper tackles multimodal fake news detection under limited labeled data by integrating a self-supervised contrastive module for image features with a learnable multimodal fusion mechanism based on Q-Former and a frozen LLM (Vicuna), guided by four prompts and balanced by Automatic Weighted Loss. The architecture yields an $88.88\%$ accuracy and consistently outperforms strong baselines across accuracy, precision, recall, and F1 on a large public dataset. Key contributions include (1) a contrastive learning booster for image representations, (2) a learnable alignment between image and text features via Q-Former, and (3) dynamic, multi-task loss balancing that improves generalization. The approach demonstrates that dynamic multimodal infusion with LLMs can substantially improve fake news detection, with potential practical impact for social platforms and fact-checking workflows, while future work could integrate social-network and event-context information for further gains.

Abstract

The rapid growth of social media has resulted in an explosion of online news content, leading to a significant increase in the spread of misleading or false information. While machine learning techniques have been widely applied to detect fake news, the scarcity of labeled datasets remains a critical challenge. Misinformation frequently appears as paired text and images, where a news article or headline is accompanied by a related visuals. In this paper, we introduce a self-learning multimodal model for fake news classification. The model leverages contrastive learning, a robust method for feature extraction that operates without requiring labeled data, and integrates the strengths of Large Language Models (LLMs) to jointly analyze both text and image features. LLMs are excel at this task due to their ability to process diverse linguistic data drawn from extensive training corpora. Our experimental results on a public dataset demonstrate that the proposed model outperforms several state-of-the-art classification approaches, achieving over 85% accuracy, precision, recall, and F1-score. These findings highlight the model's effectiveness in tackling the challenges of multimodal fake news detection.

A Self-Learning Multimodal Approach for Fake News Detection

TL;DR

This paper tackles multimodal fake news detection under limited labeled data by integrating a self-supervised contrastive module for image features with a learnable multimodal fusion mechanism based on Q-Former and a frozen LLM (Vicuna), guided by four prompts and balanced by Automatic Weighted Loss. The architecture yields an accuracy and consistently outperforms strong baselines across accuracy, precision, recall, and F1 on a large public dataset. Key contributions include (1) a contrastive learning booster for image representations, (2) a learnable alignment between image and text features via Q-Former, and (3) dynamic, multi-task loss balancing that improves generalization. The approach demonstrates that dynamic multimodal infusion with LLMs can substantially improve fake news detection, with potential practical impact for social platforms and fact-checking workflows, while future work could integrate social-network and event-context information for further gains.

Abstract

The rapid growth of social media has resulted in an explosion of online news content, leading to a significant increase in the spread of misleading or false information. While machine learning techniques have been widely applied to detect fake news, the scarcity of labeled datasets remains a critical challenge. Misinformation frequently appears as paired text and images, where a news article or headline is accompanied by a related visuals. In this paper, we introduce a self-learning multimodal model for fake news classification. The model leverages contrastive learning, a robust method for feature extraction that operates without requiring labeled data, and integrates the strengths of Large Language Models (LLMs) to jointly analyze both text and image features. LLMs are excel at this task due to their ability to process diverse linguistic data drawn from extensive training corpora. Our experimental results on a public dataset demonstrate that the proposed model outperforms several state-of-the-art classification approaches, achieving over 85% accuracy, precision, recall, and F1-score. These findings highlight the model's effectiveness in tackling the challenges of multimodal fake news detection.

Paper Structure

This paper contains 20 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An example of fake news (mismatching image-text) from dataset 8.
  • Figure 2: The overall structure of multimodal fake news detection. The model is composed of three components, contrastive learning module is for learning the image feature using a small sample of training data, infusing module aims to align text and image feature and then apply the large language model for the multimodal combination, the classification module is for the prediction of fake news.
  • Figure 3: Monument configuration for contrastive learning
  • Figure 4: Q-Former structure adopted from 24