MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection
Hongzhen Lv, Wenzhong Yang, Fuyuan Wei, Jiaren Peng, Haokun Geng
TL;DR
MDF tackles the challenge of fake news detection in multi-modal posts by explicitly modeling noise and uncertainty in text and image streams. It introduces a two-stage approach: UEM, which maps unimodal features into a Gaussian latent space via multi-head attention to capture intra-modal uncertainty, and DFN, which uses graph attention networks and Dempster-Shafer evidence theory to dynamically fuse modalities based on uncertainty scores. The method achieves state-of-the-art results on Twitter and Weibo, with ablations showing clear gains from both uncertainty modeling and dynamic fusion, and a comprehensive analysis of hyperparameters and loss functions supporting robust performance. This framework offers a principled, data-driven way to balance modality contributions in the presence of noise, with potential implications for broader multimodal fusion tasks in social-media analytics.
Abstract
Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in turn leads to intra- and inter-modal uncertainty. In addition, although many methods based on simply splicing two modalities have achieved more prominent results, these methods ignore the drawback of holding fixed weights across modalities, which would lead to some features with higher impact factors being ignored. To alleviate the above problems, we propose a new dynamic fusion framework dubbed MDF for fake news detection. As far as we know, it is the first attempt of dynamic fusion framework in the field of fake news detection. Specifically, our model consists of two main components:(1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image. In order to present better results for the dynamic fusion framework, we use GAT for inter-modal uncertainty and weight modeling before DFN. Extensive experiments on two benchmark datasets demonstrate the effectiveness and superior performance of the MDF framework. We also conducted a systematic ablation study to gain insight into our motivation and architectural design. We make our model publicly available to:https://github.com/CoisiniStar/MDF
