Table of Contents
Fetching ...

MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

Hongzhen Lv, Wenzhong Yang, Fuyuan Wei, Jiaren Peng, Haokun Geng

TL;DR

MDF tackles the challenge of fake news detection in multi-modal posts by explicitly modeling noise and uncertainty in text and image streams. It introduces a two-stage approach: UEM, which maps unimodal features into a Gaussian latent space via multi-head attention to capture intra-modal uncertainty, and DFN, which uses graph attention networks and Dempster-Shafer evidence theory to dynamically fuse modalities based on uncertainty scores. The method achieves state-of-the-art results on Twitter and Weibo, with ablations showing clear gains from both uncertainty modeling and dynamic fusion, and a comprehensive analysis of hyperparameters and loss functions supporting robust performance. This framework offers a principled, data-driven way to balance modality contributions in the presence of noise, with potential implications for broader multimodal fusion tasks in social-media analytics.

Abstract

Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in turn leads to intra- and inter-modal uncertainty. In addition, although many methods based on simply splicing two modalities have achieved more prominent results, these methods ignore the drawback of holding fixed weights across modalities, which would lead to some features with higher impact factors being ignored. To alleviate the above problems, we propose a new dynamic fusion framework dubbed MDF for fake news detection. As far as we know, it is the first attempt of dynamic fusion framework in the field of fake news detection. Specifically, our model consists of two main components:(1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image. In order to present better results for the dynamic fusion framework, we use GAT for inter-modal uncertainty and weight modeling before DFN. Extensive experiments on two benchmark datasets demonstrate the effectiveness and superior performance of the MDF framework. We also conducted a systematic ablation study to gain insight into our motivation and architectural design. We make our model publicly available to:https://github.com/CoisiniStar/MDF

MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

TL;DR

MDF tackles the challenge of fake news detection in multi-modal posts by explicitly modeling noise and uncertainty in text and image streams. It introduces a two-stage approach: UEM, which maps unimodal features into a Gaussian latent space via multi-head attention to capture intra-modal uncertainty, and DFN, which uses graph attention networks and Dempster-Shafer evidence theory to dynamically fuse modalities based on uncertainty scores. The method achieves state-of-the-art results on Twitter and Weibo, with ablations showing clear gains from both uncertainty modeling and dynamic fusion, and a comprehensive analysis of hyperparameters and loss functions supporting robust performance. This framework offers a principled, data-driven way to balance modality contributions in the presence of noise, with potential implications for broader multimodal fusion tasks in social-media analytics.

Abstract

Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in turn leads to intra- and inter-modal uncertainty. In addition, although many methods based on simply splicing two modalities have achieved more prominent results, these methods ignore the drawback of holding fixed weights across modalities, which would lead to some features with higher impact factors being ignored. To alleviate the above problems, we propose a new dynamic fusion framework dubbed MDF for fake news detection. As far as we know, it is the first attempt of dynamic fusion framework in the field of fake news detection. Specifically, our model consists of two main components:(1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image. In order to present better results for the dynamic fusion framework, we use GAT for inter-modal uncertainty and weight modeling before DFN. Extensive experiments on two benchmark datasets demonstrate the effectiveness and superior performance of the MDF framework. We also conducted a systematic ablation study to gain insight into our motivation and architectural design. We make our model publicly available to:https://github.com/CoisiniStar/MDF
Paper Structure (28 sections, 21 equations, 11 figures, 4 tables)

This paper contains 28 sections, 21 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Sample low-quality multi-modal posts.Irregular news tweets are applied in the first one.The attached image with low resolution is shown in the second one.
  • Figure 2: Overview of MDF:It mainly consists of a UEM module that employs an attention mechanism, a graph attention network, a DFN module that incorporates the Dempster-Shafer theory of evidence, and a fake news detector.Given a noisy post and accompanying image captured by a social platform, the UEM maps the tweet and the accompanying image into the corresponding potential subspaces, respectively, to complete the unimodal intra-modal uncertainty modeling. Subsequently, it is fed into GAT for inter-modal uncertainty modeling and two-modal weight modeling,the dynamic weight perception of the two modalities is completed using the DFN module containing Dempster-Shafer evidence theory.The final confidence of each modality is fed back to the fake news detector to complete the final dynamic fusion strategy.
  • Figure 3: Diagram of the UEM architecture.The UEM architecture based on the multi-attention mechanism will represent each modality as a Gaussian distribution satisfying a mean of $\mu$ and a variance of $\sigma^2$ based on the noise inherent in its features.And the learned mean values are combined with the original unimodal features to form a robust representation of each modality.
  • Figure 4: Diagram of the effect of binary classification based on D-S evidence theory. The light blue region is the confidence score derived from the single text modality, and the light yellow region is the confidence score derived from the single image modality. The white region represents the conflict region of the two decisions. The light red region represents the part where the decision values of the two modalities are compatible.
  • Figure 5: Visualize the resultant comparative performance of MDF with its variants.
  • ...and 6 more figures