Table of Contents
Fetching ...

Harmful Visual Content Manipulation Matters in Misinformation Detection Under Multimedia Scenarios

Bing Wang, Ximing Li, Changchun Li, Jinjin Chi, Tianze Li, Renchu Guan, Shengsheng Wang

Abstract

Nowadays, the widespread dissemination of misinformation across numerous social media platforms has led to severe negative effects on society. To address this challenge, the automatic detection of misinformation, particularly under multimedia scenarios, has gained significant attention from both academic and industrial communities, leading to the emergence of a research task known as Multimodal Misinformation Detection (MMD). Typically, current MMD approaches focus on capturing the semantic relationships and inconsistency between various modalities but often overlook certain critical indicators within multimodal content. Recent research has shown that manipulated features within visual content in social media articles serve as valuable clues for MMD. Meanwhile, we argue that the potential intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Therefore, in this study, we aim to identify such multimodal misinformation by capturing two types of features: manipulation features, which represent if visual content has been manipulated, and intention features, which assess the nature of these manipulations, distinguishing between harmful and harmless intentions. Unfortunately, the manipulation and intention labels that supervise these features to be discriminative are unknown. To address this, we introduce two weakly supervised indicators as substitutes by incorporating supplementary datasets focused on image manipulation detection and framing two different classification tasks as positive and unlabeled learning issues. With this framework, we introduce an innovative MMD approach, titled Harmful Visual Content Manipulation Matters in MMD (HAVC-M4 D). Comprehensive experiments conducted on four prevalent MMD datasets indicate that HAVC-M4 D significantly and consistently enhances the performance of existing MMD methods.

Harmful Visual Content Manipulation Matters in Misinformation Detection Under Multimedia Scenarios

Abstract

Nowadays, the widespread dissemination of misinformation across numerous social media platforms has led to severe negative effects on society. To address this challenge, the automatic detection of misinformation, particularly under multimedia scenarios, has gained significant attention from both academic and industrial communities, leading to the emergence of a research task known as Multimodal Misinformation Detection (MMD). Typically, current MMD approaches focus on capturing the semantic relationships and inconsistency between various modalities but often overlook certain critical indicators within multimodal content. Recent research has shown that manipulated features within visual content in social media articles serve as valuable clues for MMD. Meanwhile, we argue that the potential intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Therefore, in this study, we aim to identify such multimodal misinformation by capturing two types of features: manipulation features, which represent if visual content has been manipulated, and intention features, which assess the nature of these manipulations, distinguishing between harmful and harmless intentions. Unfortunately, the manipulation and intention labels that supervise these features to be discriminative are unknown. To address this, we introduce two weakly supervised indicators as substitutes by incorporating supplementary datasets focused on image manipulation detection and framing two different classification tasks as positive and unlabeled learning issues. With this framework, we introduce an innovative MMD approach, titled Harmful Visual Content Manipulation Matters in MMD (HAVC-M4 D). Comprehensive experiments conducted on four prevalent MMD datasets indicate that HAVC-M4 D significantly and consistently enhances the performance of existing MMD methods.
Paper Structure (18 sections, 13 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 13 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: The statistics from the MMD dataset Twitter reveal a quantitative relationship between manipulated visual content and veracity labels. We utilize a pre-trained model for image manipulation detection to determine if an image has been manipulated. Additionally, we provide various examples of images that have been manipulated with both harmful and harmless intentions.
  • Figure 2: The Havc-m$^4$d framework comprises four primary encoders: text encoder, image encoder, manipulation encoder, and intention encoder, which extract features from the provided text content $\mathbf{x}_i^t$ and visual content $\mathcal{X}_i^v$. These features are subsequently fed into a feature fusion network to create a combined representation. To accomplish the multiple tasks, we employ three distinct predictors aimed at veracity classification, manipulation classification, and intention classification.
  • Figure 3: Sensitivity analysis of the parameters $\alpha$ and $\beta$.
  • Figure 4: Visualization analysis of features $\mathbf{z}$, $\mathbf{e}^m$ and $\mathbf{e}^e$ with the T-SNE method.
  • Figure 5: We illustrate three representative examples for the case study.