TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

Eunjee Choi; Jong-Kook Kim

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

Eunjee Choi, Jong-Kook Kim

TL;DR

This paper tackles multimodal fake news detection by jointly leveraging text and image information through a three-pathway architecture that combines BERT/BLIPTxt for text, ResNet/BLIPImg for images, and BLIP-based image-text features. It introduces the Multimodal Tri-Transformer to fuse text, image, and image-text representations using cross-modal and self-attention, prioritizing textual cues while maintaining cross-modal context. Evaluations on Weibo and Gossipcop demonstrate state-of-the-art performance, with TT-BLIP achieving Accuracies of 96.1% and 88.5%, respectively, outperforming traditional fusion and unimodal baselines. The study establishes the value of specialized feature extraction and integrated fusion for reliable detection, offering a practical approach for combating misinformation across social media platforms.

Abstract

Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining for unified vision-language understanding and generation (BLIP) for three types of information: BERT and BLIPTxt for text, ResNet and BLIPImg for images, and bidirectional BLIP encoders for multimodal information. The Multimodal Tri-Transformer fuses tri-modal features using three types of multi-head attention mechanisms, ensuring integrated modalities for enhanced representations and improved multimodal data analysis. The experiments are performed using two fake news datasets, Weibo and Gossipcop. The results indicate TT-BLIP outperforms the state-of-the-art models.

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

TL;DR

Abstract

Paper Structure (19 sections, 7 equations, 4 figures, 3 tables)

This paper contains 19 sections, 7 equations, 4 figures, 3 tables.

Introduction
RELATED WORK
METHOD
Overview
Feature Extraction Layer
Textual feature extractor
Image feature extractor
Image-text feature extractor
Feature Fusion Layer
Fake News Detector
EXPERIMENTS AND RESULTS
Dataset
Weibo
Gossipcop
Experimental Settings
...and 4 more sections

Figures (4)

Figure 1: Real news (a) and fake news (b) examples from the Weibo dataset
Figure 2: The architecture of the proposed TT-BLIP.
Figure 3: Architecture of different fusion strategies for Multimodal fake news detection
Figure 4: t-SNE visualization of extracted features from the Weibo test set using TT-BLIP. Each color represents a distinct label grouping.

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

TL;DR

Abstract

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (4)