Exploring Modality Disruption in Multimodal Fake News Detection
Moyang Liu, Kaiying Yan, Yukun Liu, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Chenxing Li
TL;DR
The paper tackles modality disruption in multimodal fake news detection on social media by introducing FND-MoE, a Mixture-of-Experts framework with modality-specific encoders, cross-attention fusion, and a two-pass feature selection pipeline. The two-pass mechanism combines an attention-based top-$k$ gate with a differentiable Gumbel-Sigmoid selector to filter out disruptive information, followed by a transformer classifier. Empirical results on FakeSV and FVC-2018 show that FND-MoE outperforms state-of-the-art methods, with accuracy gains of $3.45\%$ and $3.71\%$, respectively, and ablations justify the chosen gating strategy. The approach highlights the importance of mitigating modality disruption for robust multimodal fake news detection with practical implications for real-world social-media data analysis.$
Abstract
The rapid growth of social media has led to the widespread dissemination of fake news across multiple content forms, including text, images, audio, and video. Compared to unimodal fake news detection, multimodal fake news detection benefits from the increased availability of information across multiple modalities. However, in the context of social media, certain modalities in multimodal fake news detection tasks may contain disruptive or over-expressive information. These elements often include exaggerated or embellished content. We define this phenomenon as modality disruption and explore its impact on detection models through experiments. To address the issue of modality disruption in a targeted manner, we propose a multimodal fake news detection framework, FND-MoE. Additionally, we design a two-pass feature selection mechanism to further mitigate the impact of modality disruption. Extensive experiments on the FakeSV and FVC-2018 datasets demonstrate that FND-MoE significantly outperforms state-of-the-art methods, with accuracy improvements of 3.45% and 3.71% on the respective datasets compared to baseline models.
