Harmfully Manipulated Images Matter in Multimodal Misinformation Detection
Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li
TL;DR
This work addresses multimodal misinformation detection by leveraging manipulation traces in images and the underlying manipulation intentions (harmful vs harmless). It introduces Hami-m$^3$d, a three-task model that learns manipulation features $\mathbf{e}^M$ and intention features $\mathbf{e}^E$ through a four-encoder architecture and a multi-head attention fusion, supervised by a veracity predictor plus auxiliary manipulation and intention classifiers. To overcome the lack of ground-truth labels for manipulation and intention, the method uses two weakly supervised signals: a manipulation teacher trained on external image manipulation data with PU adaptation and a PU-based objective for intention, along with a reliability-based pruning mechanism. Extensive experiments on GossipCop, Weibo, and Twitter show consistent improvements over strong baselines, with ablations confirming the value of both the manipulation/intention features and the PU-based supervision. The approach offers a scalable, weakly supervised pathway to incorporate manipulation cues into practical MMD systems, potentially improving resilience to multimodal misinformation.
Abstract
Nowadays, misinformation is widely spreading over various social media platforms and causes extremely negative impacts on society. To combat this issue, automatically identifying misinformation, especially those containing multimodal content, has attracted growing attention from the academic and industrial communities, and induced an active research topic named Multimodal Misinformation Detection (MMD). Typically, existing MMD methods capture the semantic correlation and inconsistency between multiple modalities, but neglect some potential clues in multimodal content. Recent studies suggest that manipulated traces of the images in articles are non-trivial clues for detecting misinformation. Meanwhile, we find that the underlying intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Accordingly, in this work, we propose to detect misinformation by learning manipulation features that indicate whether the image has been manipulated, as well as intention features regarding the harmful and harmless intentions of the manipulation. Unfortunately, the manipulation and intention labels that make these features discriminative are unknown. To overcome the problem, we propose two weakly supervised signals as alternatives by introducing additional datasets on image manipulation detection and formulating two classification tasks as positive and unlabeled learning problems. Based on these ideas, we propose a novel MMD method, namely Harmfully Manipulated Images Matter in MMD (HAMI-M3D). Extensive experiments across three benchmark datasets can demonstrate that HAMI-M3D can consistently improve the performance of any MMD baselines.
