Table of Contents
Fetching ...

Towards Robust and Realible Multimodal Misinformation Recognition with Incomplete Modality

Hengyang Zhou, Yiwei Wei, Jian Yang, Zhenyu Zhang

TL;DR

This work tackles robust multimodal misinformation recognition under incomplete modalities, a realistic challenge in the propagation of multimedia content. It introduces MMLNet, a three-component framework built on CLIP-based encoders: Multi-Expert Collaborative Reasoning to leverage unimodal and cross-modal distributions, Incomplete Modality Adapters to compensate missing features, and Modality Missing Learning with a label-aware contrastive objective to sharpen robustness. By dynamically fusing text, image, and cross-modal representations through a transformer-based joint expert and a routing mechanism, the approach achieves state-of-the-art performance on three real-world datasets across two languages, particularly under various missing-data scenarios. The results suggest substantial practical impact for timely and accurate misinformation detection, with publicly available code to facilitate reproducibility and further research.

Abstract

Multimodal Misinformation Recognition has become an urgent task with the emergence of huge multimodal fake content on social media platforms. Previous studies mainly focus on complex feature extraction and fusion to learn discriminative information from multimodal content. However, in real-world applications, multimedia news may naturally lose some information during dissemination, resulting in modality incompleteness, which is detrimental to the generalization and robustness of existing models. To this end, we propose a novel generic and robust multimodal fusion strategy, termed Multi-expert Modality-incomplete Learning Network (MMLNet), which is simple yet effective. It consists of three key steps: (1) Multi-Expert Collaborative Reasoning to compensate for missing modalities by dynamically leveraging complementary information through multiple experts. (2) Incomplete Modality Adapters compensates for the missing information by leveraging the new feature distribution. (3) Modality Missing Learning leveraging an label-aware adaptive weighting strategy to learn a robust representation with contrastive learning. We evaluate MMLNet on three real-world benchmarks across two languages, demonstrating superior performance compared to state-of-the-art methods while maintaining relative simplicity. By ensuring the accuracy of misinformation recognition in incomplete modality scenarios caused by information propagation, MMLNet effectively curbs the spread of malicious misinformation. Code is publicly available at https://github.com/zhyhome/MMLNet.

Towards Robust and Realible Multimodal Misinformation Recognition with Incomplete Modality

TL;DR

This work tackles robust multimodal misinformation recognition under incomplete modalities, a realistic challenge in the propagation of multimedia content. It introduces MMLNet, a three-component framework built on CLIP-based encoders: Multi-Expert Collaborative Reasoning to leverage unimodal and cross-modal distributions, Incomplete Modality Adapters to compensate missing features, and Modality Missing Learning with a label-aware contrastive objective to sharpen robustness. By dynamically fusing text, image, and cross-modal representations through a transformer-based joint expert and a routing mechanism, the approach achieves state-of-the-art performance on three real-world datasets across two languages, particularly under various missing-data scenarios. The results suggest substantial practical impact for timely and accurate misinformation detection, with publicly available code to facilitate reproducibility and further research.

Abstract

Multimodal Misinformation Recognition has become an urgent task with the emergence of huge multimodal fake content on social media platforms. Previous studies mainly focus on complex feature extraction and fusion to learn discriminative information from multimodal content. However, in real-world applications, multimedia news may naturally lose some information during dissemination, resulting in modality incompleteness, which is detrimental to the generalization and robustness of existing models. To this end, we propose a novel generic and robust multimodal fusion strategy, termed Multi-expert Modality-incomplete Learning Network (MMLNet), which is simple yet effective. It consists of three key steps: (1) Multi-Expert Collaborative Reasoning to compensate for missing modalities by dynamically leveraging complementary information through multiple experts. (2) Incomplete Modality Adapters compensates for the missing information by leveraging the new feature distribution. (3) Modality Missing Learning leveraging an label-aware adaptive weighting strategy to learn a robust representation with contrastive learning. We evaluate MMLNet on three real-world benchmarks across two languages, demonstrating superior performance compared to state-of-the-art methods while maintaining relative simplicity. By ensuring the accuracy of misinformation recognition in incomplete modality scenarios caused by information propagation, MMLNet effectively curbs the spread of malicious misinformation. Code is publicly available at https://github.com/zhyhome/MMLNet.

Paper Structure

This paper contains 30 sections, 20 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Illustration of misinformation scenarios with incomplete modalities. Existing works have only investigated the scenario of complete modalities in the first step of the distortion propagation theory.
  • Figure 2: Overview of MMLNet framework.
  • Figure 3: Distribution of features in space for complete and incomplete modality.
  • Figure 4: Experimental results of Weibo dataset. 'T' and 'I' respectively represent the text and image modalities.
  • Figure 5: Experimental results of Weibo21 dataset. 'T' and 'I' respectively represent the text and image modalities.
  • ...and 3 more figures